(19) 



J 



Europai ches Pat ntamt 
European Patent Office 
Office europ^en des brev td 



(12) 



(43) Date of publication: 

30.07.1997 Bulletin 1997/31 

(21) Application number: 97100117.7 

(22) Date of filing: 07.01.1997 



(11) EP 0 786 519 A2 

EUROPEAN PATENT APPLICATION 

(51) lntci.6: 01 2N 15/00 



(84) Designated Contracting States: 


• Choi, Gil H. 


AT BE CH DE DK ES Fl FR GB GR IE IT LI LU MC 


Rockvllle, Maryland 20850 (US) 


NL PT SE 


• Barash, Steven C. 


Designated Extension States: 


Rockvllle, Maryland 20850 (US) 


AL LT LV RO SI 


* Dillon, Patrick J. 




Gafthereburg, Maryland 20879 (US) 


(30) Priority: 05.01.1996 US 9861 


• Fannon, Michael R. 




Silver Spring, Maryland 20906 (US) 


(71) Applicant: HUMAN GENOME SCIENCES, INC. 


• Roeen, Craig A. 


Rockvllle, MD 20850-3338 (US) 


Laytonsvllle, Maryland 20882 (US) 


(72) Inventors: 


(74) Representative: VOSSIUS & PARTNER 


• Kunech, Charles A. 


Poetf ach 86 07 67 


Galtheraburg, Maryland 20882 (US) 


81634 MOnehen(DE) 



(54) Staphylococcus aureus polynucleotides and sequences 



(57) The present inventbn provides polynucleotide 
sequences of the genome of Staphylococxms aureus, 
polypeptide sequences encoded by the polynucleotide 
sequences, corresponding polynucleotides and 
polypeptides, vectors and hosts comprising the polynu- 



cleotides, and assays and other uses thereof. The 
present invention further provides polynucleotide and 
polypeptide sequence information stored on computer 
readable media, and computer-based systems and 
methods which facilitate its use. 




CM 
< 
O) 

10 

CD 
00 

o 

LU 

Printed by Jouve, 75001 PARIS (FR) 



WEST 



EP0 786 519 A2 



D scription 

The present invention relates to the field of molecular biology. In particular, it relates to, among other things, nu- 
cleotide sequences of Staphybcoccus aureus, contigs, ORFs. fragm nts, probes, primers and related polynucleotides 
s thereof, peptides and polypeptides encoded by the sequences, and uses of the polynucleotides and sequences thereof, 
such as in fermentation* polypeptide production, assays and pharmaceutical development, among others. 

The genus Staphylococcus includes at least 20 distinct species. (For a review see Novick, R. P., The Staphyloco- 
ccus as a Molecular Genetic System, Chapter 1, pgs. 1-37 in MOLECULAR BIOLOGY OF THE STAPHYLOCOCCI, 
R. Novick, Ed., VCH Publishers, New York (1990)). Species differ from one another by 80% or more, by hybridization 
10 kinetics, whereas strains within a species are at least 90% identical by the same measure. 

The species Staphylococcus aureus, a gram-positive, facultatively aerobic, clump-forming cocci, is among the 
most Important etblogical agents of bacterial infection in hunnans, as discussed briefly below. 

Hunrtan Health and S. Aureus 

IS 

Staphylococcus aureus is a ubiquitous pathogen. (See, for instance, Mims et ai, MEDICAL MICROBIOLOGY, 
Mosby-Year Book Europe Limited, London, UK (1993)). It is an etiological agent of a variety of conditions, ranging in 
severity from mild to fatal. A few of the more common conditions caused by 5, aureus infection are burns, cellulitis, 
eyelid infections, food poisoning, joint infections, neonatal conjunctivitis,osteomyelitis, skin infections, surgical wound 
20 infection, scalded skin syndrome and toxk; shock syndrome, some of which are described further below. 

Bums 

Burn wounds generally are sterile initially. However, they generally compromise physical and immune barriers to 
2S infection, cause loss of fluid and electrolytes and result in local or general physiological dysfunctbn. After cooling, 
contact with viable bacteria results in mixed colonization at the Injury site. Infection may be restricted to the non-viable 
debris on the bum surface ("eschar"), it may progress into full skin infection and invade viable tissue below the eschar 
and it may reach below the skin, enter the lymphatic and blood circulation and develop into septk;aemia S. aureus Is 
among the most important pathogens typically found in burn wound infections. It can destroy granulatbn tissue and 
30 produce severe septicaemia. 

Cellulitis 

Cellulitis, an acute infection of the skin that expands from a typically superficial origin to spread below the cutaneous 
55 layer, most commonly is caused by S. aureus in conjunction with S. pyrogenes. Cellulitis can lead to systemic infection. 
In fact, cellulitis can be one aspect of synergistic bacterial gangrene. This condition typically is caused by a mixture of 
S. aureus microaerophilic streptococci. It causes necrosis and treatment Is limited to excisksn of the necrotic tissue. 
The condition often is fatal. 

40 Eyelid infectioris 

S. aureus is the cause of styes and of sticky eye" in neonates, among other eye infections. Typically such infections 
are limited to the surface of the eye, and may occasionally penetrate the surface with more severe consequences. 

4S Food poisoning 

Some strains of S. aureus produce one or more of five serobgically distinct, heat and acid stable enterotoxins that 
are not destroyed by digestive process of the stomach and small intestine (enterotoxins A-E). Ingestion of the toxin, 
in sufficient quantities, typically results in severe vomiting, but not diarrhoea The effect does not require viable bacteria. 
50 Although the toxins are known, their mechanism of action is not understood. 

Joint infections 

S. aureus infects bone joints causing diseases such osteomyelitis. 

ss 

-Osteomyelitis 

S. aureus is the most common causative agent of haematogenous osteomyelitis. The disease tends to occur in 
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chitdren and adolescents more than adults and it is associated with non-penetrating injuries to bones. Inf ction typically 
occurs in the long end of growing bone, hence its occurrence in physically immature populations. Most often, infection 
is localized in the vicinity of sprouting capillary loops adjacent to epiphysial growth plates in th end of long, growing 
bones. 

5 

Skin infections 

S. aureus is the most common pathogen of such minor skin Infections as abscesses and boils. Such infections 
often are resolved by normal host response mechanisms, but they also can develop into severe internal infections. 
10 Recunent infections of the nasal passages plague nasal carriers of S. aureus. 

Surgical Wound Infections 

Surgical wounds often penetrate far into the body. Infection of such wound thus poses a grave risk to the patient. 
IS s. aureus is the most important causative agent of Infections in surgical wounds. 5. aureus is unusually adept at 
invading surgical wounds; sutured wounds can be infected by far fewer S. aureus cells then are necessary to cause 
Infection in normal skin. Invasbn of surgical wound can lead to severe S. aureus septicaemia. Invasion of the blood 
stream by S. aureus can lead to seeding and Infectton of internal organs, particularly heart valves and bone, causing 
systemic diseases, such as endocarditis and osteomyelitis. 

20 

Scalded SIdn Syndrome 

S. aureus is responsible for "scalded skin syndrome" (also called toxic epidermal necrosis, Ritter's disease and 
Lyell's disease). This diseases occurs In older children, typically in outbreaks caused by flowering of S. aureus strains 
25 produce exfoIiation(also called scalded skin syndrome toxin). Although the bacteria initially may infect only a minor 
lesbn, the toxin destroys intercellular connections, spreads epidermal layers and allows the infectbn to penetrate the 
outer layer of the skin, producing the desquamatbn that typifies the diseases. Shedding of the outer layer of skin 
generally reveals normal skin below, but fluid lost in the process can produce severe injury in young children if it is not 
treated properly. 

30 

Toxic Shock Syndrome 

Toxic shock syndrome is caused by strains of S. aureus that produce the so-called toxic shock syndrome toxin. 
The disease can be caused by S. aureus infection at any site, but it is too often erroneously viewed exclusively as a 
35 disease solely of women who use tampons. The disease involves toxaemia and septicaemia, and can be fatal. 

Nocosomial Infectbns , 

In the 1984 National Nocosomial lnfectk»n Surveillance Study ("NNIS") S. aureus was the most prevalent agent 
40 of surgical wound infections in many hospital services, including medicine, surgery, obstetrics, pediatrics and newborns. 

Resistance to drugs ofS. aureus strains 

Prior to the introduction of penicillin the prognosis for patients serkausly infected with S. aureus was unfavorable. 
4S Following the introduction of penrcillin in the early 1 940s even the worst S. aureus inf ections generally coukJ be treated 
successfully. The emergence of penicillin-resistant strains of S. aureus did not take long, however. Most strains of S. 
aureus encountered in hospital infections today do not respond to penicillin; although, fortunately, this is not the case 
for S. aureus encountered in community infections. 

It is well known now that penicillin-resistant strains of S. aureus produce a lactamase which converts penicillin to 
so pencillinotc acid, and thereby destroys antibiotk: activity. Furthermore, the lactamase gene often is propagated episo- 
mally, typically on a plasmid, and often is only one of several genes on an episomal element that, together, confer 
muttidnjg resistance. 

Methicillins, Introduced in the 1960s, largely overcame the problem of penicillin resistance in S. aureus. These 
compounds conserve the portions of penicillin responsible for antibiotic activity and modify or alter other portions that 
ss make penicillin a good substrate for inactivating lactamases. However, methicillin resistance has emerged in S. aureus, 

along with resistance to many other antibiotics effective against this organism, including aminoglycosides, tetracycline, 

chloramphenicol, macrolkies and lincosamides. In fact, methtcill in-resistant strains of S. aureus generally are multiply 
drug resistant. 
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The molecular genetics of most types of drug resistance in S. aureus has been elucidated (See Lyon ef al. Micro- 
biology Reviews 5V. 88-1 34 (1 987)). Generally, resistance is mediated by plasm ids, as noted above regarding penicillin 
resistance; however, several stable forms of drug resistance have been observed that apparently involve integration 
of a resistance element into the S. aureus genome Itself. 
6 Thus far each new antibiotic gives rise to resistance strains, stains emerge that are resistance to multiple drugs 

and increasingly persistent forms of resistance begin to emerge. Drug resistance of S. aureus infections already poses 
significant treatment difficulties, which are likely to get much worse unless new therapeutic agents are developed. 

Molecular Genet led of Staphyfococcus Aureus 

10 

Despite its importance in, among other things, human disease, relatively little is known about the genome of this 
organism. 

Most genetic studies of S. aureus have been carried out using the the strain NCTC8325, which contains prophages 
psill psi12 and psi13. and the UV-cured derivative of this strain, 8325-4 (also refen-ed to as RN450), which is free of 
IS the prophages. 

These studies revealed that the S. aureus genome, like that of other staphylooocd, consists of one circular, cov- 
alently closed, double-stranded DNA and a collection of so-called variable accessory genetic elements, such as 
prophages, ptasmids, transposons and the like. 

Physical characterization of the genome has not been carried out in any detail. Pattee et al published a tow res- 

^0 olution and Incomplete genetic and physical map of the chromosome of S. aureus strain NGTC 8325. (Pattee et al 
Genetic and Physical Mapping of Chromosome of Staphylococcus aureus NCTC 8325, Chapter 11, pgs. 163-169 in. 
MOLECULAR BIOLOGYOF THE STAPHYLOCOCCI, R.P Novick. Ed., VCH Publishers, New York, (1 990) The genetic 
map largely was produced by mapping insertions of Tn551 and Tn4001 , which, respectively, confer erythromycin and 
gentamicin resistance, and by analysis of Smal-digested DNA by Pulsed Field Gel Electrophoresis ("PFGE"). 

^5 The map was of low resolution; even estimating the physical size of the genome was difficult according to the 

investigators. The size of the largest Snnal chromosome fragment, for instance, was too large for accurate sizing by 
PFGE. To estimate its size, additional restrictton sites had to be introduced into the chromosome using a transposon 
containing a Smal recognition sequence. 

In sum. most physical characteristics and almost all of the genes of Staphylococcus aureus are unknown. Among 

30 the few genes that have been identified, most have not been physically mapped or characterized in detail. Only a very 
few genes of this organism have been sequenced. (See, for instance Thomsberry, J. , Antimicrobial Chemotherapy gl 
SuppI C : 9-16 (1988), current versions of GENBANK and other nucleic acid databases, and references that relate to 
the genome of S. aureus such as those set out elsewhere herein.) 

It is clear that the etiology of diseases mediated or exacerlDated by S. aureus infection Involves the programmed 

55 expression of S, aureus genes, and that characterizing the genes and their patterns of expression woukJ add dramat- 
ically to our understanding of the organism and its host interactions. Knowledge of S. aureus genes and genomic 
organization would dramatically improve understanding of disease etiology and lead to improved and new ways of 
preventing, ameliorating, arresting and reversing diseases. Moreover, characterized genes and genomic fragments of 
S. aureas would provide reagents for, among other things, detecting, characterizing and controllings, aureus Infections. 

40 There is a need therefore to characterize the genome of S, aureus and for polynucleotides and sequences of this 
organism. 

The present invention is based on the sequencing of fragments of the Staphylococcus aureus genome. The primary 
nucleotide sequences which were generated are provided In SEQ ID NOS: 1-5,191. 

The present Invention provides the nucleotide sequence of several thousand contigs of the Staphylococcus aureus 
45 genome, which are listed in tables below and set out In the Sequence Listing submitted herewith, and representative 
fragments thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In one embod- 
iment, the present inventbn Is provided as contiguous strings of primary sequence information corresponding to the 
nucleotide sequences depicted in SEQ ID NOS: 1-5, 191. 

The present invention further provides nucleotide sequences which are at least 95%, preferably 99% and most 
50 preferably 99.9%, Identical to the nucleotide sequences of SEQ ID NOS:1 -5, 1 91 . 

The nucleotide sequence of SEQ ID NOS:1-5,191, a representative fragment thereof, or a nucleotkie sequence 
which Is at least 95%, preferably 99% and most preferably 99.9%, identical to the nucleotide sequence of SEQ ID 
NOS: 1-5, 191 may be provided in a variety of mediums to facilitate its use. In one application of this embodiment, the 
sequences of the present invention are recorded on computer readable media. Such media includes, but is not limited 
55 to:nnagnetic storage media, such as floppy discs, hand disc storage medium, and nnagnelic tape; optical storage media 
' -such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/ 
optical storage media. 

The present invention f urth r provides systems, particularly computer-based systenns which contain the sequence 
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inlomnation herein described stored in a data storage means. Such systems are designed to identify commercially 
important fragments of the Staphylococcus aureus genome. 

Another embodiment of the present invention is directed to fragments, preferably isolated fragments, of the Sta- 
phylococcus aureus genome having particular structural or functional attributes. Such fragments of the Staphylococcus 
6 aureus genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter 
referred to as open reading frames or ORFs," fragments which modulat the expression of an operably linked ORF, 
hereinafter referred to as expression modulating fragments or EMFs," and fragments which can be used to diagnose 
the presence of Staphylococcus aureus in a sample, hereinafter referred to as diagnostic fragments or "DFs." 

Each of the ORFs in fragments of the Staphylococcus aureus genome disclosed in Tables 1-3, and the EMFs 
10 found 5' to the ORFs, can be used in numerous ways as polynucleotide reagents. For instance, the sequences can be 
used as diagnostic probes or amplification primers for detecting or determining the presence of a specific microbe in 
a sample, to selectively control gene expression in a host and in the productbn of polypeptides, such as polypeptides 
encoded by ORFs of the present invention, particular those polypeptides that have a pharmacological activity 

The present invention further includes recombinant constructs comprising one or more fragments of the Staphy- 
IS lococcus aureus genome of the present invention. The recombinant constructs of the present invention comprise vec- 
tors, such as a plasmid or viral vector, into which a fragment of the Staphylococcus aureus has been inserted. 

The present Invention further provides host celts containing any of the isolated fragments of the Staphylococcus 
aureus genome of the present invention. The host cells can be a higher eukaryotic host cell, such as a mamnnalian 
cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a bacterial cell. 
20 The present invention is further directed to polypeptides and proteins, preferably isolated polypeptides and pro- 

teins, encoded by ORFs of the present invention. A variety of methods, well known to those of skill in the art, routinely 
may be utilized to obtain any of the polypeptides and proteins of the present invention. For instance, polypeptides and 
proteins of the present invention having relatively short, simple amino acid sequences readily can be synthesized using 
commercially available automated peptkJe synthesizers. Polypeptides and proteins of the present invention also may 
2S be purified from bacterial cells which naturally produce the protein. Yet another alternative is to purify polypeptide and 
proteins of the present invention can from celts whbh have been altered to express them. 

The inventbn further provides polypeptides, preferably isolated polypeptides, comprising Staphylococcus aureus 
epitopes and vaccine compositions comprising such polypeptides. Also provided are methods for vacciniating an in- 
divkJual against Staphylococcus aureus infection. 
30 The Invention further provides methods of obtaining homotogs of the fragments of the Staphylococcus aureus 

genome of the present invention and homologs of the proteins encoded by the ORFs of the present invent »n. Specif- 
ically, by using the nucleotide and amino acid sequences disclosed herein as a probe or as primers, and technk^ues 
such as PGR cloning and colony/plaque hybrkiization, one skilled in the art can obtain homotogs. 

The invention further provides antibodies whrch selectively bind polypeptkJes and proteins of the present invention. 
3S Such antibodies include both monoctonal and polyclonal antibodies. 

The invention further provides hybridonnas which produce the above-described antibodies. A hybridoma is an 
immortalized cell line whbh is capable of secreting a specific monoclonal antibody. 

The present invention further provides methods of identifying test samples derived from cells which express one 
of the ORFs of the present invention, or a homolog thereof. Such methods comprise incubating a test sample with one 
40 or rrwre of the antibodies of the present invention, or one or more of the Dfs or antigens of the present invention, urtder 
conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom. 

In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry 
out the above-described assays. 

Specifically, the invention provides a compartmentalized kit to receive, in closeconfinement, oneor more containers 
45 whtoh comprises: (a) a first container comprising one of the antibodies, antigens, or one of the DFs of the present 
invent ton; and (b) one or more other containers comprising one or more of the followtng:wash reagents, reagents 
capable of detecting presence of bound antibodies, antigens or hybridized DFs. 

Using the Isolated proteins of the present invention, the present invention further provkJes methods of obtaining 
and identifying agents capable of binding to a polypeptide or protein encoded by one of the ORFs of the present 
so inventton. Specifically, such agents include, as further described betow. antibodies, peptides, carbohydrates, pharma- 
ceutical agents and the like. Such methods comprise steps of: (a)contacting an agent with an isolated protein encoded 
by one of the ORFs of the present invention; and (b)detemnining whether the agent binds to said protein. 
. ^ . _ The present genomto sequences of Staphylococcus aureus will be of great value to all laboratories working with 
this organism and for a variety of commercial purposes. Many fragments of the Staphylococcus aureus genome will 
55 be immediately identified by similarity searches against Gen Bank or protein databases and will be of jrrimediate value 
to Staphylococcus aureus researchers and for immediate commercial value for the production of proteins or to control 
gene expression. 

The methodology and technology for elucklating extensive genomic sequences of bacterial and other genomes 
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has and will greatly enhance the ability to analyze and understand chromosomal organization. In particular, sequenced 
contigsand genomes will provide the models tor developing tools for the analysis of chromosome structure and function, 
including the ability to identify genes within large segments of genomic DNA, the structure, position, and spacing of 
regulatory elements, the identification of genes with potential industrial applications, and the ability to do comparative 
5 genomic and molecular phylogeny. 

FIGURE 1 is a bbck diagram of a computer system (1 02) that can be used to implement computer-based systems 
of present Invention. 

FIGURE 2 is a schematic diagram depicting the data flow and computer programs used to collect, assemble, edit 
and annotate the contigs of the Staphylococcus aureus genome of the present invention. Both Macintosh and Unix 

10 platforms are used to handle the AB 373 and 377 sequence data files, largely as described In Kerlavage et al, Pro- 
ceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, 585, IEEE Computer So- 
ciety Press, Washington D.C. (1993), Factura (AB) is a Macintosh program designed for automatic vector sequence 
removal and end-trimming of sequence files. The program Loadis runs on a Macintosh platform and parses the feature 
data extracted from the sequence files by Factura to the Unix based Staphylococcus aureus relational database. As- 
sembly of contigs (and whole genome sequences) is accomplished by retrieving a specific set of sequence files and 
their associated features using extrseq, a Unix utility for retrieving sequences from an SQL database. The resulting 
sequence file is processed by seq_filter to trim portions of the sequences with more than 2% ambiguous nucleotides. 
The sequence files were assembled using TIQR Assembler, an assembly engine designed at The Institute for Genomic 
Research ( TIGR") tor rapid and accurate assembly of thousands of sequence fragments. The collection of contigs 

20 generated by the assembly step is loaded into the database with the lassie program. Identification of open reading 
frames (ORFs) is accomplished by processing contigs with zorf. The ORFs are searched against S. aureus sequences 
from Genbank and against all protein sequences using the BLASTN and BLASTP programs, described in Attschul et 
al., J. Mol. Biol. 215 : 403-410 (1990)). Results of the ORF determination and similarity searching steps were loaded 
into the database. As described below, some results of the determination and the searches are set out in Tables 1-3.. 

25 The present invention is based on the sequencing of fragments oil the Staphylococcus aureus genome and analysis 

of the sequences. The primary nucleotide sequences generated by sequencing the fragments are provided in SEQ ID 
NOS: 1 -5,1 91 . (As used herein, the "p"nnary sequence" refers to the nucleotide sequence represented by the lUPAC 
nomenclature system.) 

In additbn to the aforementioned Staphylococcus aureus polynucleotide and polynucleotide sequences, the 
30 present invention provides the nucleotide sequences of SEQ I D NOS:1 -5, 1 91 , or representative fragments thereof, in 
a form whch can be readily used, analyzed, and interpreted by a skilled artisan. 

As used herein, a "representative fragment of the nucleotide sequence depk:ted in SEQ ID NOS:1 -5,1 91" refers 
to any portion of the SEQ ID NOS:1-5,191 which is not presently represented within a publicly available database. 
Preferred representative fragments of the present invention are Staphylococcus aureusopBr\ reading frames ( ORFs"), 
35 expression modulating fragment ( EMFs") and fragments which can be used to diagnose the presence of Staphyloco- 
ccus aureus in sample ("DFs"). A non-limiting identification of preferred representative fragments is provided in Tables 
1-3. 

As discussed in detail below, the infomnation provided in SEQ ID NOS:1 -5,191 and in Tables 1-3 together with 
routine cloning, synthesis, sequencing and assay methods will enable those skilled in the art to clone and sequence 
40 all "representative fragments" of interest, including open reading frames encoding a large variety of Staphylococcus 
aureus proteins. 

Whilethe presently disctosed sequences of-SEQ ID NOS: 1-5, 191 are highly accurate, sequencing techniques are. 
not perfect and, in relatively rare instances, further investigation of a fragment or sequence of the invention may reveal 
a nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID NOS:1-5,191. However, once the 

^ present invention is made available [i.e., once the infonnation in SEQ ID NOS: 1 -5,1 91 and Tables 1 -3 has been nnade 
available), resolving a rare sequencing error In SEQ ID NOS:1-5,191 will be well within the skill of the art. The present 
discbsure makes available sufficient sequence information to allow any of the described contigs or portions thereof to 
be obtained readily by etralghtfonvard application of routine techniques. Further sequencing of such polynucleotide 
may proceed in like manner using manual and automated sequencing methods which are employed ubiquitous in the 

so art. Nucleotide sequence editing software is publicly available. For example, Applied Bbsystem's (AB) AutoAssembler 
can be used as an aid during visual inspection of nucleotide sequences. By employing such routine techniques potential 
errors readily may be kjentified and the correct sequence then may be ascertained by targeting further sequencing 

effort, also of a routine nature, to-the-region containing the potential error _ _ . 

Even if all of the very rare sequencing errors in SEQ ID NOS:1-5,191 were corrected, the resulting nucleotide 

ss sequences would still be at least 95% identical, nearly all would be at least 99% identical, and the great majority would 
be at least 99.9% identical to the nucleotide sequences of SEQ ID NOS:1-5,191. 

As discussed elsewhere hererln, polynucleotides of the present invention readily may be obtained by routine ap- 
plbation of welt known and standard procedures for cloning and sequencing DNA. Detailed methods for obtaining 
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libraries and for sequencing are provided below, for instance. A wide variety of Staphylococcus aureus strains that can 
be used to prepare S aureus genomic DNA for cloning and for obtaining polynucleotides of the present invention are 
available to the public from recognized depository institutions, such as the American Type Culture Collection (ATCC"). 
The nucleotide sequences of the genomes from different strains of Staphylococcus aureus differ som what. How- 

s ever, the nucleotide sequences of the genomes of all Staphylococcus aureus strains will be at least 95% identical, in 
corresponding part, to the nucleotide sequences provided in SEQ ID NOS:1-5.191. Nearly all will be at least 99% 
identical and the great majority will be 99,9% identical. 

Thus, the present invention further provides nucleotide sequences which are at least 95%, preferably 99% and 
most preferably 99.9% identical to the nucleotide sequences of SEQ ID NOS:1-5,191, in a form which can be readily 

10 used, analyzed and interpreted by the skilled artisan. 

Methods for determining whether a nucleotide sequence is at least 95%, at least 99% or at least 99.9% identical 
to the nucleotide sequences of SEQ ID NOS:1 -5,191 are routine and readily available to the skilled artisan. For example, 
the well known fasta algorithm described In Pearson and Lipman, Proc. Natl. Acad, Set. USA&5: 2444 (1 988) can be 
used to generate the percent Identity of nucleotide sequences. The BLASTN program also can be used to generate 

IS an identity score of polynucleotides compared to one another 

COMPUTER RELATED EMBODIMENTS 

The nucleotide sequences provided in SEQ ID NOS:1-5,191, a representative fragment thereof, or a nucleotide 

20 sequence at least 95%, preferably at least 99% and most preferably at least 99.9% Identical to a polynucleotide se- 
quence of SEQ ID NOS:1-5,191 may be "provided* in a variety of mediums to facilitate use thereof. As used herein, 
bprovided" refers to a manufacture, other than an isolated nuciek; acid molecule, which contains a nucleotide sequence 
of the present Inventon; Le., a nucleotide sequence provided in SEQ ID NOS:1-5,191, a representative fragment 
thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical 

2S to a polynucleotide of SEQ ID NOS: 1 -5, 1 91 . Such a manufacture provides a large portion of the Staphylococcus aureus 
genome and parts thereof {e.g., a Staphylococcus aureus open reading frame (ORF)) in a form which allows a skilled 
artisan to examine the manufacture using means not directly applicable to examining the Staphylococcus aureus ge- 
nome or a subset thereof as it exists in nature or in purified fomn. 

In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer 

30 readable media. As used herein, "computer readable media" refers to any medium which can be read and accessed 
directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard 
disc storage medium, and magnetic tape; optical storage media such as CD- ROM; electrical storage media such as 
RAM and ROM; and hybrids of these categories, such as magnetk:yoptical storage media. A skilled artisan can readily 
appreciate how any of the presently known computer readable mediums can be used to create a manufacture com- 

35 prising computer readable medium having recorded thereon a nucleotide sequence of the present invent bn. Likewise, 
It will be clear to those of skill how additional computer readable media that may be developed also can be used to 
create analogous manufactures having recorded thereon a nucleotide sequence of the present Invent kjn. 

As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled 
artisan can readily adopt any of the presently know methods for recording infomiation on computer readable medium 

40 to generate manufactures comprising the nucleotide sequence infomnation of the present invention. 

A variety of data storage structures are available to a skilled artisan for creating a computer readable medium 
having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure-will 
generally be based on the means chosen to access the stored informatwn. In addition, a variety of data processor 
programs and formats can be used to store the nucleotide sequence informatron of the present invention on computer 

4S readable medium. The sequence infonmatwn can be represented in a word processing text file, formatted in commer- 
cially- available software such as WordPerfect and Microsoft Word, or represented in the fomn of an ASCII file, stored 
in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of 
data-processor structuring formats {e.g., text file or database) in order to obtain computer readable medium having 
recorded thereon the nucleotide sequence infornnation of the present invention. 

50 Computer software is publicly available which allows a skilled artisan to access sequence information provided in 

a computer readable medium. Thus, by provkJing in computer readable form the nucleotide sequences of SEQ ID 
NOS: 1 -5,1 91 , a representative fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and 
most preferably at least 99:9% identical to a sequence of SEQ ID NOS: 1-5,1 91 the present invention enables the 
skilled artisan routinely to access the provided sequence information for a wide variety of purposes. 

55 The examples which follow demonstrate how software which implements the BLAST (Altschul et aL, J. Mol. Biol. 

215:403410 (1990)) and BLAZE (Brutlag et aL, Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase 
system was used to identify open reading frames (ORFs) within the Staphylococcus aureus genome which contain 
homotogy to ORFs or proteins from both Staphylococcus aureus and from other organisms. Among the ORFs discussed 



7 



WEST 




EP0 786 519 A2 

herein are protein encxxling fragments of tlie Staphylococcus aureus genom useful in producing commercially impor- 
tant proteins, such as enzymes used in fermentation reactions and in the production of commercially useful metabotrtes. 

The present Invention further provides systems, particularly computer-based systems, which contain the sequence 
infomnation described h rein. Such systems are designed to identify, among other things, commercially important f rag- 

5 ments of the Staphylococcus aureus genome. 

As used herein, "a computer-based system" refers to the hardware means, software means, and data storage 
means used to analyze the nucleotide sequence information ot the present invention. The minimum hardware means 
of the computer-based systems of the present invention comprises a central processing unit (CPU), Input means, 
output means, and data storage means. A skilled artisan can readily appreciate that any one of the currently available 

10 computer-based system are suitable for use in the present invention. 

As stated above, the computer-based systems of the present invention comprise a data storage means having 
stored therein a nucleotide sequence of the present Invention and the necessary hardware means and software means 
for supporting and implementing a search means. 

As used herein, "data storage means" refers to memory which can store nucleotide sequence Infomnation of the 

IS present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide 
sequence information of the present invention. 

As used herein, "search means" refers to one or more programs which are Implemented on the computer- based 
system to compare a target sequence or target structural motif with the sequence information stored within the data 
storage means. Search means are used to identify fragments or regions of the present genomic sequences which 

20 match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a variety 
of commercially available software for conducting search means are and can be used in the computer-based systems 
of the present invention. Examples of such software includes, but is not limited to, MacPattem (EMBL), BLASTN and 
BLASTX (NCBIA). A skilled artisan can readily recognize that any one of the available algorithms or implementing 
software packages for conducting homology searches can be adapted for use In the present computer-based systems. 

25 As used herein, a "target sequence" can be any DNA or amino acki sequence of six or more nucleotides or two 

or more amino acids. A skilled artisan can readily recognize that the longer a target sequence Is, the less likely a target 
sequence will be present as a random occurrence in the database. The most preferred sequence length of a target 
sequence is from about 1 0 to 100 amino acids or from about 30 to 300 nucleotide residues. However, it is well recognized 
that searches for commercially important fragments, such as sequence fragments Involved in gene expression and 

30 protein processing, may be of shorter length. 

As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or combi- 
nation of sequences in which the sequence(s) are chosen based on a three-dimensional configuration which is formed 
upon the folding of the target motif. There are a variety of target motifs known in the art. Protein target motifs Include, 
but are not limited to, enzymic active sites and signal sequences. Nucleic acid target motifs include, but are not limited 

35 to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences). 

A variety of structural formats for the input and output means can be used to input and output the information In 
the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the 
Staphylococcus aureus genomic sequences possessing varying degrees of homobgy to the target sequence or target 
motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the 

40 target sequence or target motif and identifies the degree of homology contained in the identified fragment. 

A variety of comparing means can be used to compare a target sequence or target motif with the data storage 
means to identify sequence fragments of the Staph^ococcus aureus genome. In the present examples, implementing 
software whteh implement the BLAST and BLAZE algorithms, described in Altschul et al., J. MoL Biol. 215 : 403-410 
(1 990), was used to Identify open reading frames within the Staphylococcus aureus genome. A skilled artisan can 

45 readily recognize that any one of the publicly available homology search programs can be used as the search means 
for the computer-based systems of the present invention. Of course, suitable proprietary systems that may be known 
to those of skill also may be employed in this regard. 

Figure 1 provkjes a block diagram of a computer system illustrative of embodiments of this aspect of present 
inventbn. The computer system 1 02 includes a processor 1 06 connected to a bus 104. Also connected to the bus 104 

50 are a main memory 1 08 (preferably implemented as random access menrwry, RAM) and a variety of secondary storage 
devices 110, such as a hard drive 112 and a removable medium storage device 114. The removable medium storage 
device 1 1 4 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable 
- storage medium 116 (such as a floppy disk, a compact disk, a magnetic tape, ete.) containing control togic and/ordata 
recorded therein may be inserted into the removable medium storage device 114. The computer system 102 includes 

55 appropriate software for reading the control logic and/or the data from the removable medium storage devbe 1 1 4. once 
it is inserted into the removable medium storage device 114. 

A nucleotkie sequence of the present invention may be stored in a well known manner in the main memory 108, 
any of the secondary storage devices 110, and/or a removable storage medium 116. During execution, software for 
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accessing and processing the genomic sequence (such as search tools, connparing tools, etc.) reside in nnain memory 
108, in accordance wrlh the requirements and operating parameters of the operating system, the hardware system 
and the software program or programs. 



5 BIOCHEMICAL EMBODIMENTS 



Other embodiments of the present invention are directed to fragments of the Staphylococcus aureus genome, 
preferably to isolated fragments. The fragments of the Staphylococcus aureus genome of the present invention include, 
but are not limited to fragments which encode peptides, hereinafter open reading frames (ORFs), fragments which 
10 modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs) and frag- 
ments which can be used to diagnose the presence of Staphylococcus aureus in a sample, hereinafter diagnostic 
fragments (DFs). 

As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the Staphylococcus aureus genome' 
refers to a nucleic acid molecule possessing a specific nucleotide sequence which has been subjected to purification 
IS means to reduce, from the composition, the number of compounds which are nonnally associated with the composition. 
Particularly, the term refers to the nucleic acid molecules having the sequences set out in SEQ ID NOS:1-5,191, to 
representative fragments thereof as described above, to polynucleotides at least 95%, preferably at least 99% and 
especially preferably at least 99.9% identical in sequence thereto, also as set out above. 

A variety of purification means can be used to generated the isolated fragments of the present invention. These 
20 include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size. 

In one embodiment, Staphylococcus aureus DNA can be mechanically sheared to produce fragments of 1 5-20 kb 
in length. These fragments can then be used to generate an Staphylococcus aureus library by inserting them into 
lambda clones as described in the Examples below. Primers flanking, for example, an ORF, such as those enumerated 
in Tables 1-3 can then be generated using nucleotide sequence information provided in SEQ ID NOS: 1-5,191. Welt 
25 known and routine techniques of PGR cloning then can be used to isolate the ORF from the lambda DNA library of 
Staphylococcus aureus genomic DNA. Thus, given the availability of SEQ ID NOS:1-5,191, the information in Tables 
1, 2 and 3, and the information that may be obtained readily by analysis of the sequences of SEQ ID NOS:1-5.191 
using methods set out above, those of skill will be enabled by the present disclosure to isolate any ORF-containing or 
other nucleic acid fragment of the present inventon. 
30 The isolated nucleic acid molecules of the present Invention include, but are not limited to single stranded and 

double stranded DNA, and single stranded RNA. 

As used herein, an "open reading frame," ORF, means a series of triplets coding for amino acids without any 
termination codons and is a sequence translatable into protein. 

Tables 1 , 2 and 3 list ORFs in the Staphylococcus aureus genomic contigs of the present invention that were 
35 identified as putative coding regions by the GeneMark software using organism-specific second-order Markov proba- 
bility transition matrices. It will be appreciated that other criteria can be used, In accordance with well known analytical 
methods, such as those discussed herein, to generate more inclusive, more restrictive or more selective lists. 

Table 1 sets out ORFs In the Staphylococcus aureus contigs of the present Invention that are at least 80 amino 
acids long and over a continuous regran of at least 50 bases which are 95% or more identical (by BLAST analysis) to 
40 an S. aureus nucleotide sequence available through Genbank in November 1 996. 

Table 2 sets out ORFs in the Staphylococcus aureus contigs of the present invention that are not in Table 1 and 
match, with a BLASTP probability score of 0:01 or less, a polypeptide sequence available through Genbank by Sep- 
tember 1 996. 

Table 3 sets out ORFs in the Staphylococcus au/aus contigs of the present invention that do not match significantly, 
45 by BLASTP analysis, a polypeptide sequence available through Genbank by September 1 996. 

In each table, the first and second columns identify the ORF by, respectively, contig number and ORF number 
within the contig; the third column indbates the reading frame, taking the first 5' nucleotide of the contig as the start of 
the +1 frame; the fourth column indicates the first nucleotide of the ORF, counting from the 5' end of the contig strand; 
and the fifth column indicates the length of each ORF in nucleotides. 
so In Tables 1 and 2, column six, lists the Reference" for the closest matching sequence available through Genbank. 

These reference numbers are the databases entry numbers commonly used by those of skill in the art, who will be 
familiar with the jr. denominators. Descriptions of the numenclature are available from the National Center for Biotech- 
nology Infornration.Colunrui seven in Tables 1 and2provides1he gene name" of the mat^ 

provides the BLAST identity" score from the comparison of the ORF and the homologous gene; and column nine 
55 indicates the length in nucleotkles of the highest scoring segment pair" identified by the BLAST identity analysis. 

In Table 3, the last column, column six, indicates the length of each ORF in amino acid residues. 

The concepts of percent identity and percent similarity of two polypeptide sequences is well understood in the art. 
For exampi , two polypeptkJes 10 amino ackis in length which differ at three amino acid positions {e.g,, at positfons 
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1 , 3 and 5) are said to have a percent identity of 70%. However, the same two polypeptides would be deemed to have 
a percent similarity of 80% if, for example at position 5, the amino acids moieties, although not identical, were "similar* 
{i.e., possessed similar biochemical characteristics). Many programs for analysis of nucleotide or amino acid sequence 
similarity, such as fastaand BLAST specifically list per cent identity of a matching region as an output parameter. Thus, 

6 for instance, Tables 1 and 2 herein enumerate the per cent identity" of the highest scoring segment pair" in each ORF 
and its listed relative. Further details concerning the algorithms and criteria used for homology searches are provided 
below and are described in the pertinent literature highlighted by the citations provided below. 

It will be appreciated that other criteria can be used to generate more inclusive and more exclusive listings of the 
types set out in the tables. As those of skill will appreciate, narrow and broad searches both are useful. Thus, a skilled 

10 artisan can readily identify ORFs in contigs of the Staphylococcus aureus genome other than those listed in Tables 
1-3, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those 
ascertainable using the computer-based systems of the present invention. 

As used herein, an "expression modulating fragment," EMF, means a series of nucleotide molecules which mod- 
ulates the expression of an operably linked ORF or EMF 

1^ As used herein, a sequence is said to "nrKxIulate the expression of an operably linked sequence" when the ex- 

pression of the sequence is altered by the presence of the EMF EMFs include, but are not limHed to, promoters, and 
promoter modulating sequences (inducible elements). One class of EMFs are fragments which induce the expression 
or an operably linked ORF in response to a specific regulatory factor or physiological event. 

EMF sequences can be Identified within the contigs of the Staphylococcus aureus genome by their proximity to 

20 the ORFs provided in Tables 1-3. An intergenic segment, or a fragment of the intergenic segment, from about 10 to 
200 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will nrKxJulate the expression of an operably 
linked ORF In a fashbn similar to that found with the naturally linked ORF sequence. As used herein, an "intergenic 
segment" refers to fragments of the Staphylococcus aureus genome which are between two ORF(s) herein described. 
EMFs also can be identified using known EMFs as a target sequence or target nrx)tif in the computer-based systems 

25 of the present invention. Further, the two methods can be combined and used together 

The presence and activity of an EMF can be confirmed using an EMF trap vector. An EMF trap vector contains a 
cloning site linked to a marker sequence. A marker sequence encodes an Identifiable phenotype, such as antibiotk; 
resistance or a complementing nutrition auxotrophic factor, which can be identifted or assayed when the EMF trap 
vector Is placed within an appropriate host under appropriate conditions. As described above, a EMF will modulate the 

30 expression of an operably linked marker sequence. A more detailed discussion of various marker sequences is provided 
below. 

A sequence which is suspected as being an EMF is cbned in all three reading frames in one or more restriction 
sites upstream from the marker sequence in the EMF trap vector The vector is then transfomned into an appropriate 
host using known procedures and the phenotype of the transfomned host in examined under appropriate conditions. 
35 As described above, an EMF will modulate the expression of an operably linked marker sequence. 

As used herein, a "diagnostic fragment," DF. means a series of nucleotide molecules whk;h selectively hybridize 
to Staphylococcus aureus sequences. DFs can be readily identified by identifying unque sequences within contigs of 
the Staphylococcus aureus genome, such as by using well-known computer analysis software, and by generating and 
testing probes or amplificatk>n primers consisting of the DF sequence in an appropriate diagnostic format which de- 
40 termines amplification or hybridizatk>n selectivity. 

The sequences falling within the scope of the present invention are not limited to the specific sequences herein 
described, but also include allelic and species variations thereof. Allelic and species variations can be routinely deter- 
mined by comparing the sequences provided in SEQ ID NOS:1-5,191 , a representative fragment thereof, or a nucleotide 
sequence at least 95%, preferably 99% and most preferably 99.9% ldentk:al to SEQ ID NOS:1 -5, 1 91 , with a sequence 
^45 from another Isolate of the same species. 

Furthemnore, to accomodate codon variability, the invention includes nucleic acid molecules coding for the same 
amino acid sequences as do the nucleic acid sequences mentioned above. In other words, in the coding region of an 
ORF. substitution of one codon for another whk^h encodes the same amino acid is expressly contemplated. 

Any specific sequence disclosed herein can be readily screened for errors by resequencing a particular fragment, 
so such as an ORF, in both directions (/.a, sequence both strands). Alternatively, error screening can be performed by 
sequencing corresponding polynucleotides of Staphylococcus aureus ougm isolated by using part or ail of the fragments 
in question as a probe or primer. 

Each of the ORFs of the Step/iy/ococcus at/reus genome disclosed in Tables 1 , 2 and 3, and the EMFs found 5' 

to the ORFs, can be used as polynucleotide reagents in numerous ways. For example, the sequences can be used 
55 as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe in a sample, 
particular Staphylococcus aureus. Especially preferred in this regard are ORF such as those of Table 3, which do not 
match previously characterized sequences from other organisms and thus are most likely to be highly selective for 
Staphylococcus aureus. Also particularly preferred are ORFs that can be used to distinguish between strains of Sta- 
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phylococcus aureus, particularly those that distinguish medically important strain, such as drug-resistant strains. 

In addition, the fragments of the present invention, as broadly described, can be used to control gene expression 
through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynu- 
cleotide sequence to DNA or RNA. Triple helix- formation optimally results in a shut-off of RNA transcription from DNA, 

5 while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Information from the 
sequences of the present invention can be used to design antisense and triple helix-forming oligonucleotides. Polynu- 
cleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary 
to a region of the gene involved in transcription, for triple-helix formation, or to the mRNA itself, for antisense inhibition. 
Both techniques have been demonstrated to be effective in nrxxJel systems, and the requisite techniques are well known 

10 and involve routine procedures. Triple helix techniques are discussed in, tor example, Lee et ai, Nucl. Acids Res. 6: 
X73 (1979); Cooney 9taL Science 241 : 456 (1988); and Den^an etal.. Science 251: 1360 (1991). Antisense tech- 
niques in general are discussed in, for instance, Okano, J. Neurochem. 56: 560 (1991) and OLIGODEOXYNUCLE- 
OTIDES AS ANTISENSE INHIBITORS OF GENE EXPRESSION, CRC Press, Boca Raton, FL (1988)). 

The present invention further provides recombinant constructs comprising one or more fragments of the Staphy- 

IS lococcus aureus genomic fragments and contigs of the present invention. Certain preferred recombinant constructs off 
the present Invention comprise a vector, such as a plasm id or viral vector, into which a fragment of the Staphylococcus 
aureus genome has been inserted, in a forward or reverse orientation. In the case of a vector comprising one of the 
ORFs of the present invention, the vector may further comprise regulatory sequences, including for example, a pro- 
moter, operably linked to the ORF. For vectors comprising the EMFs of the present inventbn, the vector may further 

20 comprise a marker sequence or heterologous ORF operably linked to the EMF. 

Large numbers of suitable vectors and promoters are known to those of skill in the art and are commercially 
available for generating the recombinant constructs of the present Invention. The following vectors are provided by 
way of example. Useful bacterial vectors include phagescript, PsiX174, pBluescript SK and KS (+ and -), pNHSa, 
pNH16a, pNH18a, pNH46a (available from Stratagene); pTrc99A, pKK223-3. pKK233-3, pDR540, pRIT5 (available 

2S from Phamnacia). Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXTI , pSG (available from Stratagene) 
pSVKS, pBPV, pMSG, pSVL (available from Pharmacia). 

Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other 
vectors with selectable markers. Two appropriate vectors are pKlC232-8 and pCM7. Particular named bacterial pro- 
moters include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV 

30 thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metal tothionein- 1 . Select ran of the appropriate 
vector and promoter is well within the level of ordinary skill in the art. 

The present invention further provides host cells containing any one of the isolated fragments of the Staphylococcus 
aureus genomic fragments and contigs of the present invention, wherein the fragment has been introduced into the 
host cell using known methods. The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower 

^ eukaryotic host cell, such as a yeast cell, or a procaryotic cell, such as a bacterial cell. 

A polynucleotide of the present Invent k5n, such as a recombinant construct comprising an ORF of the present 
inventbn, may be introduced into the host by a variety of well established techniques that are standard in the art, such 
as calcium phosphate transfection, DEAE, dextran mediated transfection and electroporation, which are described in, 
for instance, Davis, L etal, BASIC METHODS IN MOLECULAR BIOLOGY (1 986). 

40 A host cell containing one of the fragments of the Staphylococcus aureus genomic fragments and contigs of the 

present invention, can be used in conventional manners to produce the gene product encoded by the isolated fragment 
(in the case of an ORF) or can be used to produce a heterologous protein underthe control of the EMF 

The present invention further provides isolated polypeptides encoded by the nucleic acid fragments of the present 
inventk3n or by degenerate variants of the nucleic acid fragments of the present invention. By "degenerate variant" is 

45 intended nucleotide fragments which differ from a nucleic acid fragment of the present invention (e.g., an ORF) by 
nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptkJe sequence. 

Preferred nucleic acid fragments of the present invention are the ORFs depicted in Tables 2 and 3 which encode 
proteins. 

A variety of methodologies known in the art can be utilized to obtain any one of the isolated polypeptides or proteins 
so of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially avail- 
able peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. 
Such short fragments as may be obtained most readily by synthesis are useful, for example, in generating antibodies 

against the native polypeptide rBs discussed further below." 

In an alternative method, the polypeptide or protein is purified from bacterial cells which naturally produce the 
^ polypeptide or protein. One skilled in the art can readily empby well-known methods for isolating polpeptides and 
proteins to isolate and purify polypeptides or proteins of the present invention produced naturally by a bacterial strain, 
or by other methods. Methods for isolation and purification that can be employed in this regard include, but are not 
limited to, immunochromatography, HPLC. size-exclusion chromatography, ion-exchange chromatography, and immu- 
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no-affinity chromatography. 

The polypeptides and proteins of the present invention also can be purified from cells which have been altered to 
express the desired polypeptide or protein. As used herein, a cell Is said to be altered to express a desired polypeptide 
or protein when the cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally 
s does not produce or which the cell normally produces at a lower level. Those skilled in the art can readily adapt pro- 
cedures for introducing and expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells 
in order to generate a cell which produces one of the polypeptides or proteins of the present invention. 

Any host/vector system can be used to express one or more of the ORFs of the present invention. These include, 
but are not limited to, eukaryotic hosts such as HeLa cells. CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic 
10 host such as £ co//' and B. subtilis. The most preferred cells are those which do not normally express the partfcular 
polypeptide or protein or which expresses the polypeptide or protein at low natural level. 

"Recombinant," as used herein, means that a polypeptide or protein is derived from recombinant {e.g,, microbial 
or mammalian) expression systems. "Microbial" refers to recombinant polypeptides or proteins made in bacterial or 
fungal (ap., yeast) expression systems. Asa product, "recombinant microbiaI"defines a polypeptide or protein essen- 
is tialty free of native endogenous substances and unaccompanied by associated native glycosylation. Polypeptides or 
proteins expressed in most bacterial cultures, e.g., E. coll will be free of glycosylation modifications; polypeptides or 
proteins expressed in yeast will have a glycosylatbn pattern different from that expressed In mammalian cells. 

"Nucleotide sequence" refers to a heteropolymer of deoxyribonucleolkJes. Generally. DNA segments encoding the 
polypeptides and proteins provided by this invention are assembled from fragments of the Staphylococcus aureus 
20 genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is 
capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a mi- 
crobial or viral operon. 

"Recombinant expression vehicle or vector" referstoa plasmid or phage or virus or vector, for expressing a polypep- 
tide from a DNA (RNA) sequence. The expression vehcle can comprise a transcriptional unit comprising an assembly 

25 of (1) a genetic regulatory elements necessary for gene expression in the host, including elements required to initiate 
and maintain transcription at a level sufficient for suitable expression of the desired polypeptide, including, for example, 
promoters and, where necessary, an enhancers and a polyadenylatton signal; (2) a structural or coding sequence 
whk;h is transcribed into mRNA and translated into protein, and (3) appropriate signals to initiate translation at the 
beginning of the desired coding region and terminate translatbn at its end. Structural units intended for use in yeast 

30 or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretbn of translated 
protein by a host cell. Alternatively, where recombinant protein is expressed without a leader or transport sequence, 
it may include an N-terminal methionine residue. This reskiue may or may not be subsequently cleaved from the 
expressed recombinant protein to provide a final product. 

■Recombinant expression system" means host cells which have stably integrated a recombinant transcriptional 

55 unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally The cells can be prokary- 
otic or eukaryotic. Recombinant expressbn systems as defined herein will express heterologous polypeptides or pro- 
teins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed. 

hAaXure proteins can be expressed In mammalian cells, yeast, bacteria, or other cells under the control of appro- 
priate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived 

40 from the DNA constructs of the present invention. Appropriate cloning and expressbn vectors for use with prokaryotic 
and eukaryotic hosts are described in Sambrook etaL, MOLECULAR CLONING:A LABORATORY MANUAL. 2^^ Edi- 
tion. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989), the discbsure of whbh is hereby 
incorporated by reference in its entirety. 

Generally, recombinant expressbn vectors will include origins of replication and selectable markers permitting 

45 transfomnation of the host cell, e.g., the ampicillin resistance gene of E. co// and S. cerevisiae TRP1 gene, and a 
promoter derived from a highly expressed gene to direct transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), alpha- 
factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled 
in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable 

50 of directing secretion of translated protein into the periplasmic space or extracellular medium. Optionally, the heterol- 
ogous sequence can encode a fusion protein including an N-terminal bentification peptide imparting desired charac- 
teristics, e.g., stabilization or simplified purification of expressed recombinant product. 

Useful expression-vectors for bacterial-use are constructed by insertlng-a stmctural DN A.sequence encoding a 
desired protein together with suitable translation initiation and termination signals in operable reading phase with a 

55 functional promoter. The vector will comprise one or more phenotypb selectable markers and an origin of replication 
to ensure maintenance of the vector and, when desirable, provide amplificatbn within the host. 

Suitable prokaryotic hosts for transformation include strains of Staphylococcus aureus, E. coli, B. subtllis, Salmo- 
nella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus. Others 
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may, also be employed as a matter of choice. 

As a representative but non-limiting example, useful expression vectors for bacterial use can comprise a selectable 
marker and bacterial origin of replication derived from commercially available plasm ids comprising genetic elements 
of the well known cloning vector pBR322 (ATCC 37017). Such commercial vectors Include, for example, pKK223-3 
5 (available form Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (available from Promega Biotec, Madison, 
Wl. USA). These pBR322 'backbone' sections are combined with an appropriate promot r and the structural sequenc 
to be expressed. 

Following transfomnation of a suitable host strain and growth of the host strain to an appropriate cell density, the 
selected promoter, where It is inducible, is de repressed or induced by appropriate means [e.g., temperature shift or 
chemical induction) and cells are cultured for an additional period to provide for expression of the induced gene product. 
Thereafter cells are typically hawested. generally by centrifugation, disrupted to release expressed protein, generally 
by physical or chemical means, and the resulting crude extract is retained for further purification. 

Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mam- 
malian expression systems include the CXDS-7 lines of monkey kidney fibroblasts, described in Gluzman, Cell 2^. 175 
'5 (1 981 ), and other cell lines capable of expressing a compatible vector, for example, the CI 27, 3T3, CHO, HeLa and 
BHK cell lines. 

Mammalian expressk)n vectors will comprise an origin of replication, a suitable promoter and enhancer, and also 
any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination 
sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for ex- 

20 ample, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required 
nontranscribed genetic elements. 

Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by Initial extraction from 
cell pellets, followed by one or more salttng-out, aqueous ion exchange or size exclusion chromatography steps. Mi- 
crobial cells employed in expresston of proteins can be disrupted by any convenient method, Including treeze-thaw 

2S cycling, sonicatbn, mechanical disruption, or use of cell lysing agents. Protein refolding steps can be used, as neces- 
sary, in completing configuratfon of the mature protein. Finally, high performance liquid chromatography (HPLC) can 
be employed for final purification steps. 

An additional aspect of the invention includes Staphylococcus aureus polypeptides which are useful as immuno- 
diagnostic antigens and/or immunoprotective vaccines, collectively 'immunologically useful polypeptides'. Such im- 

30 munotogically useful polypeptides may be selected from the ORFs disclosed herein based on techniques well known 
in the art and described elsewhere herein. The inventors have used the following criteria to select several immunolog- 
ically useful polypeptides: 

As is known in the art, an amino terminal type I signal sequence directs a nascent protein across the plasma and 
outer membranes to the exterior of the bacterial cell. Such outermembrane polypeptides are expected to be immuno- 

3S logically useful. According to Izard, J. W. et al., Mol. Microbiol. 13, 765-773; (1994), polypeptides containing type I 
signal sequences contain the following physical attributes: The length of the type I signal sequence is approximately 
15 to 25 primarily hydrophobic amino acid residues with a net positive charge in the extreme amino terminus; the 
central region of the signal sequence must adopt an alpha-helical conformation in a hydrophobic environment; and the 
region surrounding the actual site of cleavage is ideally six residues long, with small side-chain amino ackls in the -1 

40 and -3 positions. 

Also known in the art is the type IV signal sequence which is an example of the several types of functional signal 
sequences which exist in addition to the type I signal sequence detailed above.. Although functionally related, the type 
IV signal sequence possesses a unique set of biochemical and physical attributes (Strom, M. S. and Lory, S., J. Bac- 
teriol. 174, 7345-7351 ; 1992)). These are typically six to eight amino acids with a net basic charge followed by an 

4S additional sixteen to thirty primarily hydrophobic residues. The cleavage site of a type IV signal sequence is typically 
after the in'rtial six to eight amino acids at the extreme amino terminus. In addition, all type IV signal sequences contain 
a phenylalanine residue at the +1 site relative to the cleavage site. 

Studies of the cleavage sites of twenty-six bacterial lipoprotein precursors has allowed the definitron of a consensus 
amino acid sequence for lipoprotein cleavage. Nearly three-fourths of the bacterial lipoprotein precursors examined 

50 contained the sequence L-{A,S)-{G,A)-C at positions -3 to +1 , relative to the point of cleavage (HayashI, S. and Wu, 
H. C. Lipoproteins in bacteria. J Bioenerg. Biomembr. 22, 451-471; 1990). 

It well known that nnost anchored proteins found on the surface of gram-positive bacteria possess a highly con- 

served carboxyJermLnal sequence. Mprejhan fifty such proteins from organ isms„such as S. pyogenes^.S. mutans, E. . 

faecalis, S. pneumoniae, and others, have been identified based on their extracellular kxiatton and carboxy terminal 

55 amino acid sequence (Fischetti, V. A. G ram-positive commensal bacteria deliver antigens to elicit mucosal and systemic 
immunity. ASM News 62, 40541 0; 1 996). The consented region Is comprised of six charged amino acids at the extreme 
carboxy terminus coupled to 15-20 hydrophobe amino acids presumed to function as a transmembrane domain. Im- 
mediately adjacent to the transmembrane domain is a six amino acid sequence conserved in nearly all proteins ex- 
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amined. The amino acid sequence ot this region Is L-P-X-T-G-X, where X is any amino acid. 

Amino acid sequence similarities to proteins o1 known function by BLAST enables the assignment of putative 
functions to novel amino acid sequences and allows for the selection of proteins thought to function outside the cell 
wall. Such proteins are well known in the art and Include "lipoprotein", "periplasmic", or "antigen". 
5 An algorithm for selecting antigenic and immunogenic Staphylococcus aureus polypeptides including the foregoing 

criteria was developed by the present Inventors. Use of the algorithm by the inventors to select immunologically useful 
Staphylococcus aureus polypeptkies resulted in the selection of several ORFs which are predicted to be outermem- 
brane-associated proteins. These proteins are Identified in Table 4, below, and shown in the Sequence Listing as SEQ 
IDNOS:5,192to5,255. Thus the amino acid sequenceof each of several antigenicS?ap/jy/ococczis aureus polypeptides 
10 listed in Table 4 can be determined, for example, by tocating the amino acid sequence of the ORF in the Sequence 
Listing. Likewise the polynucleotide sequence encoding each ORF can be found by locating the corresponding poly- 
nucleotide SEQ ID in Tables 1, 2, or 3, and finding the corresponding nucleotide sequence in the sequence listing. 

As will be appreciated by those of ordinary skill in the art, although a polypeptide representing an entire ORF may 
be the closest approximation to a protein found in vivo, it is not always technk:ally practical to express a complete ORF 
1^ in vitro. It may be very challenging to express and purify a highly hydrophobic protein by common laboratory methods. 
As a result, the immunologically useful polypeptides described herein as SEQ ID NOS:5, 192-5,255 may have been 
modified slightly to simplify the product on of recombinant protein, and are the preferred embodiments. In general, 
nucleotide sequences whch encode highly hydrophobic domains, such as those found at the amino terminal signal 
sequence, are excluded for enhanced in vitro expression of the polypeptides. Furthermore, any highly hydrophobic 
20 amino acid sequences occurring at the carboxy terminus are also excluded. Such truncated polypeptides include for 
example the mature forms of the polypeptides expected to exist in nature. 

Those of ordinary skill in the art can identify soluble portions the polypeptide identified in Table 4, and in the case 
of truncated polypeptides sequences shown as SEQ ID NOS:5, 192-5,255, may obtain the complete predicted amino 
acid sequence of each polypeptide by translating the corresponding polynucleotides sequences of the corresponding 
25 ORF listed in Tables 1 ,2 and 3 and found in the sequence listing. 

Accordingly, polypeptides comprising the complete amino acid of an immunologk;ally useful polypeptide selected 
from the group of polypeptkdes encoded by the ORFs identified in Table 4, or an amino acid sequence at least 95% 
identical thereto, preferably at least 97% identical thereto, and most preferably at least 99% identical thereto form an 
embodiment of the invention; in addition polypeptides comprising an amino ackJ sequence selected from the group of 
30 amino acid sequences shown in the sequence listing as SEQ ID NOS:5,1 91 -5,255, or an amino acid sequence at least 
95% identical thereto, preferably at least 97% identical thereto and most preferably at least 99% identical thereto, form 
an embodiment of the invention. Polynucleotkles encoding the foregoing polypeptides also form part of the present 
inventton. 

In another aspect, the invention provides a peptide or polypeptide comprising an epitope-bearing portion of a 

35 polypeptide of the invention, particularly those epitope-bearing portbns (antigenic regions) identified in Table 4. The 
epitope-bearing portion is an immunogenic or antigenic epitope of a polypeptide of the inventbn. An "immunogenk; 
epitope" is defined as a part of a protein that elicits an antibody response when the whole protein is the immunogen. 
On the other hand, a region of a protein molecule to which an antibody can bind is defined as an "antigenic epitope." 
The number of immunogenic epitopes of a protein generally is less than the number of antigenic epitopes. See, for 

40 instance, Geysen et al., Proc. Natl. Acad. Sci. USA 81 :3998- 4002 (1 983). 

As to the selection of peptides or polypeptides bearing an antigenic epitope (i.e., that contain a region of a protein 
molecule to which an.antibody. can bind), it is well known in that art-thatTelativety short sym 
part of a protein sequence are routinely capable of eliciting an antiserum that reacts with the partially mimicked protein. 
See, for instance, Sutcliffe, J. G., Shinnick, T M., Green, N. and Learner, R. A. (1983) "Antibodies that react with 

45 predetermined sites on proteins", Science, 219:660-666. Peptides capable of elciting protein-reactive sera are fre- 
quently represented in the primary sequence of a protein, can be characterized by a set of simple chemical rules, and 
are confined neither to immunodominant regions of intact proteins (i.e., immunogenic epitopes) nor to the amino or 
carboxyl terminals. Antigenk; epitope-bearing peptides and polypeptides of the invention are therefore useful to raise 
antibodies, including monoclonal antibodies, that bind specifically to a polypeptide of the invention. See, for instance, 

so Wilson et al.. Cell 37:767-778 (1984) at 777. 

Antigenic epitope-bearing peptides and polypeptides of the invention preferably contain a sequence of at least 
seven, more preferably at least nine and most preferably between about 15 to about 30 amino acids contained within 

the.am ino.ackJ.sequence.of.a polypeptide of.theJnvention..Nonrlimiting.exampIes of antigenic poly peptides 

that can be used to generate S. aureus specific antibodies include: a polypeptide comprising peptides shown in Table 

55 4 below. These polypeptide fragments have been determined to bear antigenic epitopes ot indicated S. aureus proteins 
by the analysis of the Jameson-Wolf antigenic index, a representative sample of which is shown in Figure 3. 

The epitope-bearing peptides and polypeptides of the invention may be produced by any conventional means. 
See, e.g., Houghten, R. A. (1 985) General method for th rapid solid-phase synthesis of large numbers of peptkles: 
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specificity of antigen-antibody interaction at the level of individual amino acids. Proc. Natl. Acad. Sci. USA 82: 
5131-5135; this "Simultaneous Muftiple Peptide Synthesis (SMPS)" process is further described in U.S. Patent No. 
4,631,211 to Houghten et al. (1986). Epitope-bearing peptides and polypeptides of the invention are used to induce 
antibodies according to methods well known in the art. See, for instance, Sutcliffe et al., supra; Wilson et al., supra; 
Chow, M. et al., Proc. Natl. Acad. Sci. USA 82:910-914; and Bittle, F. J. et at., J. Gen. Virol. 66:2347-2354 (1985). 

Immunogenic epltope-boaring peptides of the invention, i.e., those parts of a protein that elicit an antibody response 
when the whole protein is the immunogen, are identified according to methods known in the art. See, for instance, 
Geysen etal., supra. Further still, U.S. Patent No. 5,1 94,392 to Geysen (1990) describes a general method of detecting 
or determining the sequence of monomers (amino acids or other compounds) which is a topological equivalent of the 
epitope (i.e., a "mimotope") which is complementary to a particular paratope (antigen binding site) of an antibody of 
interest. More generally, U.S. Patent No. 4,433,092 to Geysen (1989) describes a method of detecting or determining 
a sequence of monomers which is a topographical equivalent of a ligand which is complementary to the ligand binding 
site of a particular receptor of interest. Similarly, U.S. Patent No. 5.480,971 to Houghten, R. A. et al. (1996) on Per- 
alkylated Oligopeptide Mixtures discloses linear 01 -C7-alkyl peralkylated oligopeptides and sets and libraries of such 
peptides, as well as methods for using such oligopeptide sets and libraries for determining the sequence of a per- 
alkylated oligopeptide that preferentially binds to an acceptor molecule of interest Thus, non-peptide analogs of the 
epitope-bearing peptides of the invention also can be made routinely by these methods. 

Table 4 lists tmmunotogicalty useful polypeptides identified by an algorithm which locates novel Staphylococcus 
aureus outermembrane proteins, as is described above. Also listed are epitopes or "antigenic regions" of each of the 
identified polypeptides. The antigenic regions, or epitopes, are delineated by two numbers x-y, where x is the number 
of the first amino acid in the open reading frame included within the epitope and y Is the number of the last amino acid 
in the open reading frame included within the epitope. For example, the first epitope in ORF 168-6 is comprised of 
amino acids 36 to 45 of SEQ ID NO:5,192, as is described in Table 4. The Inventors have identified several epitopes 
for each of the antigenic polypeptWes identified in Table 4. Accordingly, forming part of the present invention are 
polypeptides comprising an amino acid sequence of one or more antigenic regions Identified in Table 4. The inventbn 
further provides polynucleotides encoding such polypeptides. 

The present invention further includes Isolated polypeptides, proteins and nucleic acid molecules which are sub- 
stantially equivalent to those herein described. As used herein, substantially equivalent can refer both to nucleic acid 
and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or nrwre 
substitutions, deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity be- 
tween reference and subject sequences. For purposes of the present invention, sequences having equivalent biobgicat 
activity, and equivalent expression characteristics are considered substantially equivalent. For purposes of determining 
equivalence, truncation of the mature sequence shoukJ be disregarded. 

The inventbn further provides methods of obtaining homobgs from other strains of Staphylococcus aureus, of the 
fragments of the Staphylococcus aureus genome of the present invention and honnologs of the proteins encoded by 
the ORFs of the present invention. As used herein, a sequence or protein of Staphylococcus aureus is defined as a 
homobg of a fragment of the Staphylococcus aureus fragments or contigs or a protein encoded by one of the ORFs 
of the present invention, if it shares significant homology to one of the fragments of the Staphylococcus at/rews genome 
of the present invention or a protein encoded by one of the ORFs of the present inventbn. Specifically, by using the 
sequence disclosed herein as a probe or as primers, and techniques such as PGR cloning and colony>|plaque hybrid- 
ization, one skilled in the art can obtain homologs. 

As used herein, two nucleic acid nnolecules or proteins are said to "share significant homology" if the two contain 
regions which prossess greater than 85% sequence (amino acid or nucleic acid) homology Preferred homologs in this 
regard are those with more than 90% homology. Especially preferred are those with 93% or more homology. Annong 
especially preferred homologs those with 95% or more homofogy are particularly preferred. Very particularly preferred 
among these are those with 97% and even more partbularly preferred annong those are homologs with 99% or more 
homobgy. The most preferred homobgs among these are those with 99.9% homobgy or more. It will be understood 
that, anrrong measures of homology, identity Is particularly preferred in this regard. 

Regbn specific primers or probes derived from the nucleotide sequence provided in SEQ ID NOS:1 -5,1 91 or from 
a nucleotide sequence at least 95%, particularly at least 99%, especialty at least 99.5% identical to a sequence of SEQ 
ID NOS:1 -5,191 can be used to prime DNA synthesis and PGR amplification, as well as to bentify colonies containing 
cloned DNA encoding a homobg. Methods suitable to this aspect of the present invention are well known and have 
been descrlbed-in great detail in nnany publications such as,-for-example, lnni6 era/., PCR-PROTOCOLS 
Press, San Diego, CA (1990)). 

When using primers derived from SEQ ID NOS:1-5,191 or from a nucleotbe sequence having an aforementioned 
identity to a sequence of SEQ ID NOS:1 -5,1 91 , one skilled in the art will recognize that by employing high stringency 
conditbns {e.g., annealing at 50-60*0 in 6X SSPC and 50% fonmamide, and washing at 50- 65'C in 0.5X SSPC) only 
sequences which are greater than 75% homologous to the primer will be amplified. By employing low r stringency 
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conditions {e.g., hybridizing at 35-37°C in 6X SSPC and 40-45% formannide, and washing at 42*C in 0.5X SSPC), 
sequences which are greater than 40-50% homologous to the primer will also be amplified. 

When using DNA probes derived from SEQ ID NOS:1-5,1 91, or from a nucleotide sequence having an aforemen- 
tioned Identity to a sequence of SEQ ID N0S:1 -5,1 91 , for cotony/plaque hybridization, one skilled in the art will recog- 
nize that by employing high stringency conditions (e.g., hybridizing at 50- 65°C in 5X SSPC and 50% fomnamide, and 
washing at 50- 65'C In 0.5X SSPC), sequences having regions which are greater than 90% homologous to the probe 
can be obtained, and that by employing lower stringency conditions (e.g., hybridizing at 35-37*C in 5X SSPC and 
40-45% formamide, and washing at 42'*C in 0.5X SSPC), sequences having regions which are greater than 35-45% 
homologous to the probe will be obtained. 

Any organism can be used as the source tor homologs of the present invention so long as the organism naturally 
expresses such a protein or contains genes encoding the same. The most preferred organism for isolating homologs 
are bacterias which are closely related to Staphylococcus aureus. 

ILLUSTRATIVE USES OF COMPOSITIONS OF THE INVENTION 

Each ORF provided in Tables 1 and 2 is identified with a function by homology to a known gene or polypeptide. 
As a result, one skilled in the art can use the polypeptides of the present Invention for commercial, therapeutic and 
industrial purposes consistent with the type of putative Identification of the polypeptide. Such identifications permit one 
skilled In the art to use the Staphylococcus aureusORFs in a manner similar to the known type of sequences for which 
the identification Is made; for example, to ferment a particular sugar source or to produce a particular metabolite. A 
variety of reviews illustrative of this aspect of the invention are available, including the following reviews on the industrial 
use of enzymes, for example. BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY HANDBOOK, 2nd Ed., Mac- 
millan Publications, Ltd. NY (1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper etal, Eds., Elsevier 
Science Publishers, Amsterdam, The Netherlands (1985). A variety of exemplary uses that illustrate this and similar 
aspects of the present invention are discussed below. 

1. Blosynthetic Enzymes 

Open reading frames encoding proteins involved In mediating the catalytrc reactions involved In intermediary and 
macromolecular metabolism, the biosynthesis of small molecules, cellular processes and other functions includes en- 
zymes involved in the degradation of the intermediary products of metabolism, enzymes involved in central intermediary 
metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, enzymes 
involved In ATP proton motor force conversion, enzymes involved in broad regulatory function, enzymes involved in 
amino acid synthesis, enzymes Involved in nucleotide synthesis, enzymes involved in cof actor and vitamin synthesis, 
can be used for industrial biosynthesis. 

The various metabolic pathways present in Staphylococcus aureus can be identified based on absolute nutritional 
requirements as well as by examining the various enzymes Identified in Table 1-3 and SEQ ID NOS:1-5,191. 

Of particular interest are polypeptides Involved In the degradation of intermediary metabolites as well as non- 
macromolecutar metabolism. Such enzymes include amylases, glucose oxidases, and catalase. 

Proteolytic enzymes are another class of commercially important enzymes. Proteolytic enzymes find use in a 
number of industrial processes including the processing of flax and other vegetable fibers, in the extraction, clarification 
and depectinizat ion of fruit juices, in Ihe extraction of vegetablesVoil and in the maceration of Jru 
give unicellular fruits. A detailed review of the proteolytic enzymes used in the food industry is provided in Rombouts 
GtaL, Symbiosis21: 79 (1986) and Voragen etaL In BIOCATALYSTS IN AGRICULTURAL BIOTECHNOLOGY, Whitak- 
er et al., Eds., American Chemical Society Symposium Series 369 : 93 (1 989) . 

The metabolism of sugars is an Important aspect of the primary metabolism of Staphylococcus aureus. Enzymes 
involved In the degradation of sugars, such as, particularly glucose, galactose, fructose and xytose, can be used in 
industrial f emnentation. Some of the important sugar transforming enzymes, from a commercial viewpoint, include 
sugar isomerases such as glucose isomerase. Other metabolic enzymes have found commercial use such as glucose 
oxidases whrch produces ketogulonic acid (KG A). KG A is an intemnediate in the commercial production of ascorbic 
acid using the Relchstein's procedure, as described in Krueger et ai. Biotechnology S^^, Rhine et ai, Eds., Verlag 
Press, Weinheim, Germany (1984). 

Glucose.oxidase (GOD)Js.commercially available and. has been used in.purified form as.well as-m 

form for the deoxygenation of beer. See, for instance, Hartmeir et al. Biotechnology Letters V. 21 (1979). The most 
important application of GOD is the industrial scale fermentation of gluconic acid. Market for gluconic acids which are 
used in the detergent, textile, leather, photographic, pharmaceutical, food, feed and concrete industry, as described, 
for example, in Bigelis et ai, beginning on page 357 in GENE MANIPULATIONS AND FUNGI; Benett etal., Eds., 
Academic Press, New York (1985). In addition to Industrial applications, GOD has found applications in medicine for 
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quantitative determination of glucose in body fluids recently in biotechnology for analyzing syrups trom starch and 
cellulose hydrosylates. This application Is described in Owusu et ai, Biochem. et Biophysica. Acta. 872: 83 (1 986), for 
instance. 

The main sweetener used in the world today is sugar which comes from sugar beets and sugar cane. In the field 
5 of industrial enzymes, the glucose isomerase process shows the largest expansion in the market today Initially, soluble 
enzymes were used and later immobilized enzymes were developed (Krueger etai, Biotechnology, The Textbook of 
Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of glu- 
cose- produced high fructose syrups is by far the largest industrial business using immobilized enzymes. A review of 
the industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988). 
10 Proteinases, such as alkaline serine proteinases, are used as detergent additives and thus represent one of the 

largest volumes of microbial enzymes used in the industrial sector. Because of their industrial importance, there is a 
large body of published and unpublished information regarding the use of these enzymes in industrial processes. (See 
Faultman etaL, Acid Proteases Structure Function and Biology, Tang, J., ed.. Plenum Press, New York (1977) and 
Godfrey et ai. Industrial Enzymes, MacMillan Publishers, Surrey, UK (1 983) and Hepner et al.. Report Industrial En- 
's zymes by 1 990, Hel Hepner & Associates, London (1986)). 

Another class of commercially usable proteins of the present invention are the microbial lipases, described by for 
instance, Macrae etaL, Philosophical Transactfons of the Chiral Society of London 310:227 (1 985) and Poserke. Jour- 
nal of the American Oil Chemist Society 61:1758 (1984). A major use of lipases Is in the fat and oil industry for the 
production of neutral glycerides using lipase catalyzed inter-esterifk;ation of readily available triglycerides. Application 
20 of lipases include the use as a detergent additive to facilitate the removal of fats from fabrics in the course of the 
washing procedures. 

The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the synthesis of complex 
organic molecules Is gaining popularity at a great rate. One area of great interest is the preparation of chiral interme- 
diates. Preparatbn of chiral intermediates is of interest to a wkJe range of synthetic chemists particularly those scientists 

25 involved with the preparation of new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et aL, Re- 
cent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Florida (1990)). 
The following reactions catalyzed by enzymes are of interest to organic chemists: hydrolysis of carboxylic acid esters, 
phosphate esters, amides and nrtriles, esterification reactions, trans-esterification reactions, synthesis of amides, re- 
duction of alkanones and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, 

30 and carbon bond forming reactions such as the aldol reaction. 

When considering the use of an enzyme encoded by one of the ORFs of the present invention for biotransformation 
and organic synthesis it is sometimes necessary to conskfer the respective advantages and disadvantages of using a 
microorganism as opposed to an isolated enzyme. Pros and cons of using a whole eel! system on the one hand or an 
isolated partially purified enzyme on the other hand, has been described In detail by Bud et aL, Chemistry in Britain 

'35 (1987), p. 127. 

Aminotransferases, enzymes involved in the biosynthesis and metabolism of amino acids, are useful in thecatalytk: 
production of amino acids. The advantages of using microbial based enzyme systems is that the amino transferase 
enzymes catalyze the stereo- selective synthesis of only L-amIno ackjs and generally possess uniformly high catalytic 
rates. A description of the use of amino transferases for amino acid production is provided by Roselle-David, Methods 
40 of Cnzymo/oav 136:479 (1 987). 

Another category of useful proteins encoded by the ORFs of the present invention include enzymes involved in 
nucleic acid synthesis, repair, and recombination. A variety of commercially important enzymes have previously been 
isolated from members of Staphylococcus aureus. These include Sau3A and Sau96l. 

45 2. Generation of Antibodies 

As described here, the proteins of the present Invention, as well as homologs thereof, can be used in a variety 
procedures and methods known in the art which are currently applied to other proteins. The proteins of the present 
inventbn can further be used to generate an antibody which selectively binds the protein. Such antibodies can be 
50 either monocbnal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms. 

The invention further provides antibodies which selectively bind to one of the proteins of the present invention and 
hybridomas which produce these antibodies. A hybridoma is an immortalized cell line which is capable of secreting a 

specific monoclonal antibody. 

In general, techniques for preparing polyclonal and monoclonal antibodies as well as hybridomas capable of pro- 
55 ducing the desired antibody are well known in the art (Campbell, A. M., MONOCLONAL ANTIBODY TECHNOLOGY: 
LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULAR BIOLOGY Elsevier Science Publishers, Am- 
sterdam, The Netherlands (1984); St. Groth etaL, J. ImmunoL MethodsSS: 1-21 (1980), Kohler and Milstein, Nature 
256 : 495-497 (1 975)), the trioma technique, the human B- cell hybridoma technique (Kozbor et aL, Immunology Today 
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4: 72 (1983), pgs. 77-96 of Cote et ai, in MONOCLONAL ANTIBODIES AND CANCER THERAPY. Alan R. LIss, Inc. 
0985)). 

Any animal (mouse, rabbit, etc.) which is known to produce antibodies can be immunized with the pseudogene 
polypeptide. Methods for immunization are well known in the art. Such methods include subcutaneous or interperitoneal 
s injection of the polypeptide. One skilled in the art will recognize that the amount ot the protein encoded by the ORF of 
th present invention used for immunizatran will vary based on the animal which is immunized, the antigenicity ot the 
peptide and the site of injectbn. 

The protein which is used as an immunogen may be modified or administered in an adjuvant in order to increase 
the protein's antigenicity Methods of increasing the antigenicity of a protein are well known in the art and include, but 
/o are not limited to coupling the antigen with a heterologous protein (such as globulin or galactosidase) or through the 
inclusion of an adjuvant during immunization. 

For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma cells, such 
as SP2/0-Ag14 myeloma cells, and allowed to become monoclonal antibody producing hybridoma cells. 

Any one of a number of methods well known In the art can be used to identify the hybridoma cell which produces 
an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay western 
blot analysis, or radioimmunoassay (Lutz etal, Exp. Cell Res. 175 : 109-124 (1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures 
known in the art (Campbell, A. M., Monocbnal Antibody Technology: Laboratory Technques in Biochemistry and Mo- 
lecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1 984)). 
20 Techniques described for the production of single chain antibodies (U. S. Patent 4,946,778) can be adapted to 

produce single chain antibodies to proteins of the present invention. 

For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is screened for 
the presence of antibodies with the desired specificity using one of the above-described procedures. 

The present invention further provides the above- described antibodies in detectably labelled form. Antibodies can 
2S be detectably labelled through the use of radioisotopes, affinity labels (such as biotin, avidin, eta), enzymatic labels 
(such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), 
paramagnetk; atoms, etc. Procedures for accomplishing such labelling are well-known in the art, for example see 
Sternberger eta/., J. Histochem. Cylochem, 18:315 (1970); Bayer, E. A etai, Meth. Enzym. 62:308 (1979); Engval, 
E. etaL, Immunol. 109:129 (1972); Goding, J. W. J. Immunol. Meth. 13:215 (1976)). 
30 The labeled antibodies of the present invention can be used for in vitro, in vivo, and in situ assays to identify cells 

or tissues in which a fragment of the Staphylococcus aureus genome is expressed. 

The present inventbn further provides the above-described antibodies imnnobilized on a solid support. Examples 
of such solid supports include plastics such as polycarbonate, complex carbohydrates such as agarose and sepharose, 
acrylic resins and such as polyacrylamkJe and latex beads. Techniques for coupling antibodies to such solid supports 
35 are well known in the art (Weir, D. M. ef at., 'Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific 
Publications, Oxford, England. Chapter 1 0 (1 986); Jacoby, W. D. etaL, Meth. Enzym. 34 Academic Press, N. Y. (1 974)). 
. The immobilized antibodies of the present invention can be used for in vitro, in vivo, and in situ assays as well as for 
immunoaffinity purification of the proteins of the present invention. 

40 3. Diagnostic Assays and Kits 

The present invention further provides methods to identify the expression of one ot the ORFs of the present in- 
vention, or homolog thereof, in a test sample, using one of the DFs,antigens or antibodies of the present invention. 
In detail, such methods comprise incubating a test sample with one or more of the antibodies, or one or more of 
45 the DFs, or one or more antigens ot the present invention and assaying for binding of the DFs, antigens or antibodies 
to components within the test sample. 

Conditions for incubating a DF, antigen or antibody with a test sample vary. Incubatbn conditions depend on the 
fornrtat empbyed in the assay, the detection methods employed, and the type and nature of the DF or antibody used 
In the assay. One skilled in the art will recognize that any one of the commonly available hybrkiization, amplification 
^0 or immunological assay formats can readily be adapted to employ the Dfs. antigens or antibodies of the present in- 
vention. Examples of such assays can be found in Chard, T, An Introduction to Radioimmunoassay and Related 
Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); Bullock, G. R. ef a/.. Techniques in 

Immunocytochemistry, Academic Press, Orlando,.FL Vbl. 1 (1982), Vbl. 2 (1 983),. Vol.. 3. (1 985); Tijssen,. P.. Practice 

and Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry; PCT publication W095/32291 , and 
ss Molecular Biotogy, Elsevier Science Publishers, Amsterdam, The Netherlands (1 985), all of which are hereby incorpo- 
rated herein by reference. 

The test samples of the present invention Include cells, protein or membrane extracts of cells, or bfobgical fluids 
such as sputum, blood, serum, plasma, or urine. The test sample used in the above-described method will vary based 
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on the assay format, nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. 
Methods for preparing protein extracts or membrane extracts of cells are well known in the art and can be readily be 
adapted in order to obtain a sample which is compatible with the system utilized. 

In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry 
out the assays of the present invention. 

Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers 
which comprises: (a) a first container comprising one of the Dfs, antigens or antibodies of the present invention; and 
(b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting 
presence of a bound DF, antigen or antibody. 

In detail, a compartmentalized kit includes any kit in which reagents are contained in separate containers. Such 
containers include small glass containers, plastic containers or strips of plastic or paper. Such containers allows one 
to efficiently transfer reagents from one compartment to another compartment such that the samples and reagents are 
not cross-contaminated, and the agents or solutions of each container can be added in a quantitative fashksn from one 
compartment to another. Such containers will include a container which will accept the test sample, a container which 
contains the antibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, 
Tris-buffers, etc.), and containers which contain the reagents used to delect the bound antibody, antigen or DF. 

Types of detection reagents include labelled nucleic acid probes, labelled secondary antibodies, or in the alterna- 
tive, if the primary antibody is labelled, the enzymatic, or antibody binding reagents which are capable of reacting with 
the labelled antibody. One skilled in the art will readily recognize that the diseased Dfs, antigens and antibodies of the 
present inventbn can be readily incorporated into one of the established kit formats which are well known in the art. 

4. Screening Assay for Binding Agents 

Using the isolated proteins of the present invention, the present invention further provkies methods of obtaining 
and identifying agents which bind to a protein encoded by one of the ORFs of the present invention or to one of the 
fragments and the Staphylococcus aureus fragment and contigs herein described. 

In general, such methods comprise steps of: 

(a) contactin g an agent with an Isolated protein encoded by one of the ORFs of the present invention, or an isolated 
fragment of the Staphylococcus aureus genome; and 

(b) determining whether the agent binds to said protein or said fragment. 

The agents screened in the above assay can be, but are not limited to, peptides, carbohydrates, vitamin derivatives, 
or other pharmaceutical agents. The agents can be selected and screened at random or rationally selected or designed 
using protein modeling techniques. 

For random screening, agents such as peptides, carbohydrates, pharmaceutical agents and the like are selected 
at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention. 

Alternatively, agents may be rationally selected or designed. As used herein, an agent Is said to be "rationally 
selected or designed" when the agent is chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate peptides, pharmaceutical agents and the 
like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptkJe peptides, 
for example see Hurby ef a/.. Application of Synthetic Peptides: Antisense PeptkJes," In Synthetic Peptides, A User's 
Guide, W. H. Freeman, NY (1 992), pp. 289-307, and Kaspczak ef a/.. Biochemistry 28:9230-8 (1 989), or phamnaceutical 
agents, or the like. 

In addition to the foregoing, one class of agents of the present invention, as broadly described, can be used to 
control gene expression through binding to one of the ORFs or EMFs of the present invention. As described above, 
such agents can be randomly screened or rationally designed/selected. Targeting the ORF or EMF allows a skilled 
artisan to design sequence specific or elennent specific agents, modulating the expression of either a single ORF or 
multiple ORFs which rely on the same EMF for expression control. 

One class of DNA binding agents are agents which contain base residues which hybridize or form a triple helix by 
binding to DNA or RNA. Such agents can be based on the classic phosphodiester, ribonucleic acid backbone, or can 
be a variety of sulfhydryl or polymeric derivatives which have base attachment caipacity. 

Agents suitable for use in_these methods usually contain 20 to 40_bases and are designed to be complementary 
to a region of the gene involved in transcription (triple helix - see Lee et ai, Nucl. Acids Res. 6:3073 (1979); Cooney 
et ai. Science 241 :456 (1 988); and Den^n et al, Science 251 : 1 360 (1 991 )) or to the mRNA itself (antisense - Okano. 
J. Neurochem. 56:560 (1991); Oligodeoxy nucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca 
Raton, FL (1988)). Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense 
RNA hybridization bkjcks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated 



19 



EP0 786 519 A2 



to be effective in model systems. Intomnation contained in the sequences of the pr sent invention can be used to design 
antisense and triple helix-forming oligonucleotides, and other DNA binding agents. 

6. Pharmaceutical Compos itione and Vaccines 

5 

The present invention further provides pharmaceutical agents which can be used to modulate the growth or path- 
ogenicity of Staphylococcus aureus, or another related organism, in vivo or in vitro. As used herein, a "phamnaceutical 
agent" is defined as a composition of matter which can be formulated using known techniques to provide a pharma- 
ceuticai compositions. As used herein, the "phamnaceutical agents of the present invention" refers the pharmaceutical 

10 agents which are derived from the proteins encoded by the ORFs of the present invention or are agents which are 
identified using the herein described assays. 

As used herein, a pharmaceutical agent is said to "modulate the growth or pathogenicity of Staphylococcus aureus 
or a related organism, in vivo or in vitro, " when the agent reduces the rate of growth, rate of division, or viability oJ the 
organism in question. The pharmaceutical agents of the present invention can modulate the growth or pathogenicity 

^5 of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to 
practice the use of the phamnaceutical agents of the present invention. Some agents will modulate the growth or path- 
ogenicity by binding to an important protein thus blocking the biological activity of the protein, while other agents may 
bind to a component of the outer surface of the organism blocking attachment or rendering the organism more prone 
to act the bodies nature immune system. Alternatively, the agent may comprise a protein encoded by one of the ORFs 

^0 of the present invention and serve as a vaccine. The devebpment and use of vaccines derived from membrane asso- 
ciated polypeptides are well known in the art. The inventors have kdentified particularly prefen'ed immunogenic Sta- 
phylococcus aureus polypeptides for use as vaccines. Such tmmunogenk; polypeptides are described above and sum- 
marized in Table 4, below. 

As used herein, a "related organism" is a broad term which refers to any organism whose growth or pathogenicity 
can be modulated by one of the pharmaceutical agents of the present invention. In general, such an organism will 
contain a honrtolog of the protein which Is the target of the pharmaceutical agent or the protein used as a vaccine. As 
such, related organisms do not need to be bacterial but may be fungal or viral pathogens. 

The pharmaceutical agents and compositions of the present inventton may be administered in a convenient man- 
ner, such as by the oral, topical, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradernnal 
so routes. The pharmaceutical compositions are administered in an amount which is effective for treating and/or proph- 
ylaxis of the specific indication. In general, they are administered in an amount of at least about 1 mg/kg body weight 
and in most cases they will be administered in an amount not in excess of about 1 g/kg body weight per day. In most 
cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body weight daily, taking into account the routes of ad- 
ministration, symptoms, etc. 

3S The agents of the present invention can be used in native form or can be modified to form a chemical derivative. 

As used herein, a molecule is said to be a ''chemk:ai derivative" of anothernrtolecule when it contains additional chemical 
moieties not normally a part of the molecule. Such nrxsieties may improve the molecule's solubility, absorption, biobgical 
half life, etc.lhe moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable 
side effect of the nrK)lecule, etc. Moieties capable of mediating such effects are disclosed in, among other sources, 

40 REf^lNGTON'S PHARf^AACEUTICAL SCIENCES (1980) cited elsewhere herein. 

For example, such moieties may change an immunological character of the functbnal derivative, such as affinity 
for a given antibody. Such changes in immunomodulation activity are measured by the appropriate assay, such as a 
competitive type immunoassay. Modifications of such protein properties as redox or thenrtal stability, biological half- 
life, hydrophobicity, susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into multimers 

4S also may be effected in this way and can be assayed by methods well known to the skilled artisan. 

The therapeutic effects of the agents of the present invention may be obtained by providing the agent to a patient 
by any suitable means (e.g., inhalation, intravenously, intramuscularly, subcutaneously, enteralty, or parenterally). It is 
preferred to administer the agent of the present invention so as to achieve an effective concentration within the blood 
or tissue in which the growth of the organism is to be controlled. To achieve an effective blood concentration, the 

50 preferred method is to administer the agent by injection. The administration may be by continuous infusion, or by single 
or multiple injections. 

In providing a patient with one of the agents of the present invention, the dosage of the administered agent will 

vary depending upon such factors as the patient's.age, weight, height, sex, general medical condition.^p 

history, etc. In general, it is desirable to provide the recipient with a dosage of agent which is in the range of from about 
55 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be administered. The therapeu- 
tically effective dose can be lowered by using combinations of the agents of the present inventton or another agent. 

As used herein, two or more compounds or agents are said to be administered "in combination" with each other 
when either (1) the physiological effects of each compound, or (2) the serum concentrations of each compound can 
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be measured at the same time. The composition of the present Invention can be administered concurrently with, prior 
to, or following the administration of the other agent. 

The agents of the present invention are intended to be provided to recipient subjects in an amount sufficient to 
decrease the rate of growth (as defined above) of the target organism. 

5 The administration of the agent(s) of the invention nnay be for either a "prophylactic'' or Iherapeutic" purpose. 

When provided prophylacticalty, th agent(8) are provided in advance of any symptonns Indicative of the organisms 
growth. The prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onset of 
any subsequent Infection. When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an 
indication of infection. The therapeutic administratton of the compound(s) serves to attenuate the pathological symp- 

10 toms of the infection and to increase the rate of recovery. 

The agents of the present invention are administered to a subject, such as a mammal, or a patient, in a pharma- 
ceutically acceptable form and in a therapeutically effective concentration. A conrtposition is said to be "pharmacolog- 
ically acceptable" if its administration can be tolerated by a recipient patient. Such an agent is said to be administered 
in a "therapeutically effective amount" if the amount administered is physiobgicalty signrficant. An agent Is physiolog- 

'5 icalty significant if its presence results in a detectable change in the physiology of a recipient patient. 

The agents of the present invention can be formulated according to known methods to prepare pharmaceutically 
useful compositions, whereby these materials, or their functional derivatives, are combined in admixture with a phar- 
maceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other human proteins, a 
g., human serum albumin, are described, for example, in REMINGTON'S PHARMACEUTICAL SCIENCES, 16*^ Ed., 

20 osol, A., Ed., Mack Publishing, Easton PA (1 980). In order to form a pharmaceutically acceptable compositran suitable 
for effective administration, such compositions will contain an effective amount of one or more of the agents of the 
present inventton, together with a suitable amount of canrier vehicle. 

Additbnal pharmaceutical methods may be employed to control the duration of action. Control release preparations 
may be achieved through the use of polymers to complex or absorb one or more of the agents of the present invention. 

25 The controlled delivery may be effectuated by a variety of well known techniques, including formulation with macro- 
molecules such as, for example, polyesters, polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcel- 
lulose, carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the macromolecules and the agent 
in the formulation, and by appropriate use of methods of incorporation, which can be manipulated to effectuate a desired 
time course of release. Another possible method to control the duration of action by controlled release preparations is 

so to incorporate agents of the present invention into particles of a polymeric material such as polyesters, polyamino 
acids, hydrogels, poly (lactic acid) or ethylene vinylacetate copolymers. Altematively, instead of incorporating these 
agents into polymeric particles, it is possible to entrap these materials in microcapsules prepared, for example, by 
coacervation techniques or by interfacial polymerization with, for example, hydroxymethylcellulose or gelatine-micro- 
capsules and poly(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example, 

35 liposomes, albumin microspheres, microemulsbns, nanoparticles, and nanocapsulesorinmacroemulstons. Such tech- 
niques are disctosed in REMINGTON'S PHARMACEUTICAL SCIENCES (1980). 

The invention further provides a pharnnaceutical pack or kit comprising one or more containers filled with one or 
more of the ingredients of the pharmaceutical compositions of the invention. Associated with such container(s) can be 
a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals 

40 or biological products, which notice reflects approval by the agency of manufacture, use or sale tor human adminis- 
tration. 

In addition, the agents of the present invention nrray be employed in conjunction with other therapeutic compounds. 
6. Shot-Gun Approach to [\Aegabase DNA Sequencing 

46 

The present invention further demonstrates that a large sequence can be sequenced using a random shotgun 
approach. This procedure, described in detail in the examples that follow, has eliminated the up front cost of isolating 
and ordering overlapping or contiguous subclones prior to the start of the sequencing protocols. 

Certain aspects of the present invention are described in greater detail in the examples that follow. The examples 
so are provided by way of illustration. Other aspects and embodiments of the present invention are contemplated by the 
inventors, as will be clear to those of skill in the art from reading the present disclosure. 
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ILLUSTRATIVE EXAMPLES 

LIBRARIES AND SEQUENCINQ 

5 1. Shotgun Sequencing Probability Analysis 

The overall strategy for a shotgun approach to whole genome sequencing foltows from the Lander and \Afaterman 
(Landerman and Watenman, Genomics 2: 231 (1 988)) application of the equation for the Polsson distribution. According 
to this treatment, the probability, Pq, that any given base In a sequence of size L, in nucleotides, is not sequenced after 

10 a certain anrwunt, n, in nucleotides, of random sequence has been determined can be calculated by the equation Pq 
= e-^, where m is L/n, the fold coverage." For instance, for a genome of 2.8 Mb, m=1 when 2.8 Mb of sequence has 
been randomly generated (IX coverage). At that point, Pq = e-i = 0.37. The probability that any given base has not 
been sequenced is the same as the probability that any region of the whole sequence L has not been determined and, 
therefore, is equivllent to the fraction of the whole sequence that has yet to be determined. Thus, at one-fold coverage, 

IS approximately 37% of a polynucleotide of size U in nucleotides has not been sequenced. When 14 Mb of sequence 
has been generated, coverage is 5X for a .2.8 Mb and the unsequenced fraction drops to .0087 or 0.67%. 5X coverage 
of a 2.8 Mb sequertce can be attained by sequencing approximately 17,000 random clones from both Insert ends with 
an average sequence read length of 410 bp. 

Similarly, the total gap length, G, Is detenmlned by the equation G = Le^"*, and the average gap size, g, follows the 

20 equation, g = L/n. Thus, 5X coverage leaves about 240 gaps averaging about 82 bp in size in a sequence of a poly- 
nucleotide 2.8 Mb long. 

The treatment above is essentially that of Lander and Waterman, Genomics 2: 231 (1 988). 

2. Random Libraiy Construction 

2S 

In order to approximate the random model described above during actual sequencing, a nearly Ideal library of 
cloned genomic fragments is required. The fol towing library construction procedure was developed to achieve this end. 

Staphylococcus aureus DNA was prepared by phenol extraction. A mixture containing 600 ug DNA in 3.3 ml of 
300 mM sodium acetate, 1 0 mM Tris-HCI, 1 mM Na-EDTA, 30% glycerol was sonicated for 1 min. at 0°Cln a Branson 

30 Model 450 Sonicator at the lowest energy setting using a 3 mm probe. The sonicated DNA was ethanol precipitated 
Eind redissolved in 500 ul TE buffer. 

To create blunt-ends, a 100 ul aliquot of the resuspended DNA was digested with 5 units of BAL31 nuclease (New 
England BioLabs) for 10 min at 30*C in 200 ul BAL31 buffer . The digested DNA was phenol-extracted, ethanol-pre- 
cipitated, redissolved in 100 ul TE buffer, and then size-fractionated by electrophoresis through a 1.0% low melting 

3S temperature agarose gel. The section containing DNA fragments 1 .6-2.0 kb in size was excised from the gel, and the 
LGT agarose was melted and the resulting solution was extracted with phenol to separate the agarose from the DNA 
DNA was ethanol precipitated and redissolved in 20 ul of TE buffer for ligation to vector. 

A two-step ligation procedure was used to produce a plasm id library with 97% Inserts, of which >99% were single 
inserts. The first ligation mixture (50 ul) contained 2 ug of DNA fragments, 2 ug pUC18 DNA (Pharmacia) cut with Smal 

40 and dephosphorylated with bacterial alkaline phosphatase, and 10 units of T4 ligase (GIBCO/BRL) and was Incubated 
at 14*C for 4 hr. The ligation mixture then was phenol extracted and ethanol precipitated, and the precipitated DNA 
was dissolved In 20 ul TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete bands In.a ladder 
were visualized by ethidium bromide-staining and UV lllumlnatton and identified by size as Insert (i), vector (v), v+i, 
v+2i, v+3i, eta The port ton of the gel containing v+i DNA was excised and the v+l DNA was recovered and resuspended 

45 into 20 ul TE. The v+i DNA then was blunt-ended by T4 polymerase treatment for 5 min. at 37° Gin a reaction mixture 
(50 ul) containing the v+i llnears, 500 uM each of the 4 dNTPs, and 9 units of T4 polymerase (New England BtoLabs), 
under recommended buffer conditions. After phenol extraction and ethanot precipitation the repaired v+i linears were 
dissolved in 20 ul TE. The final ligation to produce circles was carried out in a 50 ul reaction containing 5 ul of v+l 
llnears and 5 units of T4 ligase at 14*C overnight. After 10 min. at 70°C the following day the reaction mixture was 

so stored at -20'C. 

This two-stage procedure resulted in a molecularly random coilectton of single-insert plasm id recombinants with 
minimal contamination from double-Insert chimeras (<1%) or free vector (<3%). 

Since deviation from randomness can. arise from propagation th.e_DNA in the host, Ecc//.host cells deficient in all _ 
recombination and restriction functions (A Greener, Strategies 3 (1 ):5 (1 990)) were used to prevent rean-angements, 
55 deletions, and loss of clones by restrlctton. Furthermore, transformed cells were plated directly on antibiotic diffusion 
plates to avoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing cells. 

Plating was carried out as follows. A 100 ul alkjuot of Eptcurian Coll SURE II Supercompetent Cells (Stratagene 
200152) was thawed on ice and transferred to a chilled Falcon 2059 tube on ice. A 1.7 ul aliquot of 1.42 M beta- 
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mercaptoethanol was added to the aliquot of cells to a final concentration of 25 nnM. Cells were incubated on ice for 
10 min. A 1 ul aliquot of the final ligation was added to th cells and incubated on ice for 30 min. The cells were heat 
pulsed for 30 sec. at 42* C and placed back on ice for 2 min. The outgrowth period in liquid culture was elinrtinated 
from this protocol in order to minimize the preferential growth of any given transformed cell. Instead th transfornnation 

6 mixture was plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of SOB agar (5% SOB agar 
20 g tryptone, 5 g yeast extract, 0.5 g NaCI, 1 .5% Difco Agar per liter of media). The 5 ml bottom layer Is supplemented 
with 0.4 ml of 50 mg/ml ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 ml 
X-Gal (2%), 1 ml MgClg (1 M), and 1 ml MgSOyiOO ml SOB agar. The 15 ml top layer was poured just prbr to plating. 
Our titer was approximately 100 colonies/10 ul aliquot of transformation. 

10 All colonies were picked for template preparation regardless of size. Thus, only clones lost due to "poison" DNA 

or deleterious gene products would be deleted from the library, resulting in a slight increase in gap number over that 
expected. 

3. Random DNA Sequencing 

IS 

High quality double stranded DNA plasmid templates were prepared using an alkaline lysis method developed in 
collaboration with SPrime 3Prime Inc. (Boulder, CO). Plasmid preparation was performed in a 95-well format for all 
stages of DNA preparation from bacterial growth through final DNA purification. Average template concentration was 
determined by running 25% of the samples on an agarose gel. DNA concentrations were not adjusted. 

20 Templates were also prepared from a Staphylococcus aureus lambda genomic library. An unamplrfied library was 

constructed in Lambda DASH II vector (Stratagene). Staphylococcus aumus DNA (> 100 kb) was partially digested in 
a reaction mixture (200 ul) containing 50 ug DNA, IX Sau3AI buffer, 20 units Sau3Al for 6 min. at 23 C. The digested 
DNA was phenol-extracted and centrifuges over a 10- 40% sucroce gradient. Fractions containing genomic DNA of 
15-25 kb were recovered by precipitation . One ul of fragments was used with 1 ul of DASHII vector (Stratagene) In 

2S the recommended ligation reaction. One ul of the ligation mixture was used per packaging reaction following the rec- 
ommended protocol with the Gigapack II XL Packaging Extract Phage were plated directly without amplification from 
the packaging mixture (after dilution with 500 ul of recommended SM buffer and chloroform treatment). Yield was about 

2.5X109 pfu/y| 

An amplified library was prepared from the primary packaging mixture according to the manufactureer's protocol. 
30 The amplified library is stored frozen in 7% dimethylsulf oxide. The phage titer is approximately 1x1 (fi pfu/ml. 

Mini-liquid tysates (0.1 ul) are prepared from randomly selected plaques and template Is prepared by long range 
PCR. Samples are PGR amplified using nrxxJified T3 and T7 primers, and Etongase Supermix (LTI). 

Sequencing reactions are carried out on plasmid templates using a combination of two workstations (BIOMEK 
1000 and Hamilton Microlab 2200) and the Perkin-Elmer 9600 thermocycler with Applied Biosystems PRISM Ready 
55 Reaction Dye Primer Cycle Sequencing Kits for the M1 3 forward (M1 3-21) and the Ml 3 reverse (M13RP1) primers. 
Dye terminator sequencing reactions are carried out on the lambda templates on a Perkin-Elmer 9600 Thermocycler 
using the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. Modifjed T7 and T3 primers are 
used to sequence the ends of the inserts from the Lambda DASH II library. Sequencing reactions are on a combination 
of AB 373 DNA Sequencers and ABl 377 DNA sequencers. All of the dye terminator sequencing reactions are analyzed 
40 using the 2X 9 hour module on the AB 377. Dye primer reacttons are analyzed on a combination of ABl 373 and ABl 
377 DNA sequencers. The overall sequencing success rate very approximately Is about 85% for Ml 3-21 and Ml 3RP1 
. . sequences and 65% for dye-terminator reactions. The average. usable read length is 485 bp for Ml 3-21 sequences, 
445bp for M13RP1 sequences, and 375 bp for dye-terminator reactbns. 

45 4. Protocol for Automated Cyele Sequencing 

The sequencing was carried out using Hamilton Microstation 2200, Perkin Elmer 9600 thermocyclers, ABl 373 
and ABl 377 Autonnated DNA Sequencers. The Hamilton combines pre-aliquoted templates and reaction mixes con- 
sisting of deoxy- and dideoxynucleotides, the thermostable Taq DNA polymerase, fiuorescentty-labelled sequencing 

^ primers, and reaction buffer. Reactbn mixes and templates were combined in the wells of a 96-weil thenmocycltng 
plate and transferred to the Perkin Elmer 9600 themnocycler. Thirty consecutive cycles of linear amplification (i.e.,, one 
primer synthesis) steps were performed including denatu ration, annealing of primer and template, and extension; i.e., 

DNA synthesis..A.heated.lid with rubber_gaskets_on.the„thermocycling,plate_preventsevaporatb^ 

an oil overlay. 

5S Two sequencing protocols were used: one for dye-labeiled primers and a second for dye-labelled dideoxy chain 

terminators. The shotgun sequencing Involves use of four dye-labelled sequencing primers, one for each of the four 
terminator nucleotide. Each dye-primer was labelled with a different fluorescent dye, permitting the four Individual 
reactbns to be combined into one lane of the 373 or 377 DNA Sequencer for electrophoresis, detection, and base- 
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calling. ABI currently supplies premixed reaction mixes in bulk packages containing all the necessary non-template 
reagents for sequencing. Sequencing can be done with both plasmid and PCR-generated templates with both dye- 
primers and dye- terminators with approximately equal fidelity, although plasmid templates generally give longer usable 
sequences. 

5 Thirty-two reactions were loaded per ABI 373 Sequencer each day and 95 samples can be toaded on an ABI 377 

per day Electrophoresis was run overnight (ABI 373) or for 2 1 12 hours (ABI 377) following the manufacturer's protocols. 
Following electrophoresis and fluorescence detection, the ABI 373 or ABI 377 perfomns automatic lane tracking and 
base-calling. The lane-tracking was confirmed visually. Each sequence electropherogram (or fluorescence lane trace) 
was inspected visually and assessed for quality. Trailing sequences of low quality were renrxsved and the sequence 

10 Itself was loaded via software to a Sybase database (archived daily to 8mm tape). Leading vector polylinker sequence 
was removed automatically by a software program. Average edited lengths of sequences from the standard ABI 373 
or ABI 377 were around 400 bp and depend mostly on the quality of the template used for the sequencing reaction. 

INFORMATICS 

IS 

1. Data Management 

A number of infomnatlon management systems for a large-scale sequencing lab have been devebped. (For review 
see, for instance, Kerlavage etal, Proceedings of the Twenty-Sixth Annual Hawaii Internationa! Conference on System 

zo Sciences, IEEE Computer Society Press. Washington D. C, 585 (1993)) The system used to collect and assemble 
the sequence data was developed using the Sybase relational datalsase management system and was designed to 
automate data flow whereever possible and to reduce user error. The database stores and conrelates ail information 
collected during the entire operation from template preparatbn to final analysis of the genome. Because the raw output 
of the ABI 373 Sequencers was based on a Macintosh platform and the data management system chosen was based 

25 on a Unix platform, it was necessary to design and implement a variety of multi- user, client-server applications which 
albw the raw data as well as analysis results to flow seamlessly into the database with a minimum of user effort. 

2. Assembly 

30 An assembly engine (TIGR Assembler) developed for the rapid and accurate assembly of thousands of sequence 

fragments was enployed to generate contlgs. The TIGR assembler simultaneously clusters and assembles fragments 
of the genome. In order to obtain the speed necessary to assemble more than 10^ fragments, the algorithm buikis a 
hash table of 12 bp oligonucleotide subsequences to generate a list of potential sequence fragment overlaps. The 
number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. 

35 Beginning with a single seed sequence fragment, TIGR Assembler extends the current contig by attempting to add 
the best matching fragment based on oligonucleotide content. The contig and candkJate fragment are aligned using a 
modified version of the Smith-Waterman algorithm which provides for optimal gapped alignments (Waterman, M. S., 
Methods in EnzymoloQV 1 64: 765 (1988)). The contig is extended by the fragment only if strict criteria for the quality 
of the match are met. The match criteria include the minimum length of overlap, the maximum length of an unmatched 

40 end, and the minimum percentage match. These criteria are automatbalty bwered by the algorithm in regions of minimal 
coverage and raised in regions with a possible repetitive element. The number of potential overlaps for each fragment 
determines which fragments are Ijkely to fall into repetitive elements. Fragments representingthe boundaries of repet- 
itive elements and potentially chime rb fragments are often rejected based on partial mismatches at the ends of align- 
ments and excluded from the current contig. TIGR Assembler is designed to take advantage of clone size information 

45 coupled with sequencing from both ends of each template. It enforces the constraint that sequence fragments from 
two ends of the same template point toward one another in the contig and are located within a certain ranged of base 
pairs (definable for each clone based on the known clone size range for a given library). 

3. Identifying Genes 

so 

The predicted coding regtons of the Staphylococcus aureus genome were initially defined with the program zorf, 
whbh finds ORFs of a minimum length. The predicted coding region sequences were used in searches against a 
database of all StaphyiooDCCUS aureus nucleotide sequences from GenBank (release 92.0), using the BLASTN search _ 
method to identify overlaps of 50 or more nucleotides with at least a 95% identity. Those ORFs with nucleotide sequence 
55 matches are shown in Table 1 . The ORFs without such matches were translated to protein sequences and and com- 
pared to a non-redundant database of known proteins generated by combining the Swiss-prot, PIR and GenPept 
databases. ORFs of at least 80 amino acids that matched a database protein with BLASTP probability less than or 
equal to 0.01 are shown in Table 2. The table also lists assigned functions based on the closest match in the databases. 
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ORFs of at least 120 amino acids that did not match protein or nucleotide sequences in the databases at these levels 
are shown in Table 3. 

ILLUSTRATIVE APPLICATIONS 

5 

1. Production of an Antibody to a Staphylococcus aureus Protein 

Substantially pure protein or polypeptide is isolated from the transfected or transformed cells using any one of the 
methods known In the art. The protein can also be produced in a recombinant prokaryotic expression system, such as 
10 E. col'u or can by chemically synthesized. Concentration of protein in the final preparation is adjusted, for example, by 
concentration on an Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the 
protein can then be prepared as follows. 

2. t\/2onoctonal Antibody Production by Hybrldoma Fusion 

IS 

Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from 
murine hybridornas according to the classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or nradifi- 
cations of the methods thereof. Briefly a mouse is repetitively inoculated with a tew micrograms of the selected protein 
over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. 

20 The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells 
destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused 
cells are diluted and allquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. 
Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay 
procedures, such as ELISA. as originally described by Engvall, E.. Meth. Enzymol. 70:419 (1980), and modified meth- 

2S ods thereof. Selected positive clones can be expanded and their nnonocbnal antibody product harvested for use. 
Detailed procedures for monoctonal antibody production are described in Davis, L. et al Basic Methods in Molecular 
Biology Elsevier, New York. Section 21-2 (1 989). 

3. Polyclonal Antibody Production by Immunization 

30 

Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by Im- 
munizing suitable animals with the expressed protein described above, which can be unmodified ormodified to enhance 
immunogenicity. Effective polyclonal antibody productran is affected by many factors related both to the antigen and 
the host species. For example, small molecules tend to be less immunogenic than other and may require the use of 

55 carriers and adjuvant. Also, host animals vary in response to site of inoculatbns and dose, with both inadequate or 
excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigenadministered at multiple 
. intradermal sites appears to be most reliable. An effective Immunization protocol for rabbits can be found in Vkitukaitis, 
J. etal, J. Clin. Endocrinol. Metab. 33:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as de- 

40 termined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the 
antigen, begins to fall. See, for example, Ouchterlony, O. et al, Chap. 1 9 in: Handbook of Experimental Immunology, 
Wier, D., ed, Blackwell (1973). Plateau concentration of antibody Is usually in the range of.0. 1 to 0. 2 mgAnI of serum, 
(about 1 2M). Affinity of the antisera for the antigen is determined by preparing competitive binding cun^es, as described, 
for example, by Fisher, D., Chap. 42 in:Manual of Clinical Immunology second edition, Rose and Friedman, eds., Amer. 

45 Soc. For Microbiology, Washington, D. C. (1980) 

Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which de- 
termine concentrations of antigen-bearing substances In biological samples; they are also used semi- quantitatively 
or qualitatively to klentify the presence of antigen in a biological sample. In addition, they are useful in various animal 
models of Staphylococcal disease known to those of skill in the art as a means of evaluating the protein used to make 

50 the antibody as a potential vaccine target or as a means of evaluating the antibody as a potential immunothereapeutic 
reagent. 

3, Preparation ot PGR Primers and Ampliflcrtto^ . _ . _ _ 

55 Various fragments of the Staphylococcus aureus genome, such as those of Tables 1 -3 and SEQ ID NOS: 1 -5, 1 91 

can be used, in accordance with the present invention, to prepare PCR primers for a variety of uses. The PGR primers 
are preferably at least 15 bases, and more preferably at least 18 bases in length. When selecting a primer sequence, 
it is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approxi- 
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matety the same. The PGR primers and amplified DNA of this Example find use in the Examples that follow. 
4. Gene expression from DNA Sequences Corresponding t ORFs 

5 A fragment of the Staphylococcus aureus genome provided in Tables 1 -3 is introduced into an expression vector 

using conventional technology. Techniques to transfer cloned sequences Into expression vectors that direct protein 
translation in mammalian, yeast, insect or bacterial expression systems are well known in the art. Commercially avail- 
able vectors and expression systems are available from a variety of suppliers including Stratagene (La Jolla, Calrfomia), 
Promega (Madison, Wisconsin), and Invltrogen (San Diego, California). If desired, to enhance expression and facilitate 

10 proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular ex- 
pression organism, as explained by Hatfield ©fa/., U. S. Patent No. 5,082.767, incorporated herein by this reference. 

The following is provided as one exemplary method to generate polypepttde(s) from cloned ORFs of the Staphy- 
lococcus aureus genomeiragmenX. Bacterial ORFs generally lack a poly A addition signal. The addition signal sequence 
can be added to the construct by, for example, splicing out the poly A addition sequence from pSG5 (Stratagene) using 

^5 Bgll and Sail restrk:1k)n endonuclease enzymes and incorporating it into the mammalian expressbn vector pXTI (Strat- 
agene) for USB in eukaryotic expression systems. pXT1 contains the LTRs and a portion of the gag gene of Moloney 
Murine Leukemia Virus. The positions of theLTRs in the construct allow efficient stable transfection. The vector includes 
the Herpes Simplex thymidine kinase promoter and the selectable neomycin gene. The Staphylococcus aureus DNA 
is obtained by PGR from the bacterial vector using oligonucleotide primers complementary to the Staphylococcus 

20 aureus DNA and containing restriction endonuclease sequences for PstI incorporated into the 5' primer and Bgtll at 
the 5' end of the corresponding Staphylococcus aureus DNA 3* primer, taking care to ensure that the Staphylococcus 
aureus DNA is positbned such that its followed with the poly A addition sequence. The purified fragment obtained from 
the resulting PGR reaction is digested with PstI, blunt ended with an exonuclease, digested with Bgll I, purified and 
ligated to pXT1 , now containing a poly A addition sequence and digested Bgltl. 

25 The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, 

New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the 
transfected cells in 600 ug^ml G41 8 (Sigma. St. Louis, Missouri). The protein is preferably released into the supernatant. 
However if the protein has membrane binding domains, the protein nr^y additionally be retained within the celt or 
expression may be restricted to the cell surface. Since it may be necessary to purify and locate the transfected product, 

30 synthetic 15-mer peptides synthesized from the predicted Staphylococcus aureus DNA sequence are injected into 
mfce to generate antibody to the polypeptide encoded by the Staphylococcus aureus DNA. 

Altemativly and if antibody product ton is not possible, the Staphylococcus aureus DNA sequence is additionally 
incorporated into eukaryotic expression vectors and expressed as, for example, a gbbin fusion. Antibody to the globin 
moiety then is used to purify the chimeric protein. Corresponding protease cleavage sites are engineered between the 

35 globin moiety and the polypeptide encoded by the Staphylococcus aureus DNA so that the latter may be freed from 
the formed by simple protease digestion. One useful expression vector for generating gbbin chimerics is pSG5 (Strat- 
agene). This vector encodes a rabbit globin. Intron II of the rabbit globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of expressbn. These 
techniques are well known to those skilled in the art of molecular biobgy Standard methods are published in methods 

40 texts such as Davis etai, cited elsewhere herein, and many of the methods are available from the technbal assistance 
representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptides of the invent ton also may be pro- 

. duced using /n vrt/o translation systems such.as /n.Wf/o ExpressTM Translation Kit (Stratagene). _ _ . . 

While the present invention has been described in some detail for purposes of clarity and understanding, one 
skilled in the art will appreciate that various changes in form and detail can be made without departing from the true 

45 scope of the invention. 

All patents, patent applbations and publicatbns referred to above are hereby incorporated by reference. 
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202-211 1 ! i 




101-119 . 


139-154 


ICC T Q 1 














i 


1 








1 


n ? 7 


136-149 


197-211 


21 8-229 


253-273 


■ ■ 1 
I 


442 1 


199-210 : 


247-257 


264-277 


287-309 


1 
















^04 ? 


178-187 


250-259 










44_1 












161_4 








\ 

4 


46.5 


131-141 ■ 


162-176 


206-21 5 


243-252 


264-273 ; 


285-294 


942_1 








1 
1 


5_4 


189-205 


230-239 


246-264 


301-318 


340-354 


378-387 


20.4 


202-212 


217-234 


260-275 


314-336 


366-373 


380-391 


3Z8-2 








i 


520_2 








t 


7711 


145-154 








999_1 1 








853_1 






1 i 


287.1 


154-164 




1 1 


288-2 






1 




596_2 


121-130 








217_5 


244-253 


259-268 


288-297 


302-311 I 




217_6 


! 144-158 


174-183 


188-197 


207-216 


! 226-242 




52B 3 




1 „ — 




171.11 • 




1 ! 


63 4 1 






1 


353 2 ! 








743'_1 


' 197-207 








! 


34^2 4 ; 






J i 


69.3 


; 195-211 










70.6 


206-215 


263-272 


! 291-301 


331-340 


358-371 


' 390-414 


129.2 


: 117-127 


141-157 


= 158-183 


: 202-211 


222-231 


• 261-270 


58_5 


• 184-203 


260-269 


: 275-299 


t 330-344 


372-381 


: 424-433 


IftR 3 . i i 


236.6 


138-147 


163-172 


: 187-198 


! 244-261 


268-278 


308-317 


310.8 


■ 131-140 


144-153 


1 177-186 


! 190-199 


. 204-213 


"216-227 


601.1 


' 208-218 




I 


i 






544_3 


. 170-179 


184-193 


! 224-235 


1 274-287 


327-336 


352-361 


662 1 


87 7 I : 1 ' 


120 1 • ' ' 
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Table 4 



ORF 1 




Antif^enic! Regions ! 


(cont) ! 




I 


Regi nil 


Region 1 Z ! 


Region 1 3 1 


Region 1 4 : Region 1 5 I 


Region 1 6 


168 6 i 1 1 ' i 


238 1 ^ 1 1 : - 


51.2 ; 




278_3 1 


i ! 


276 2 i i 




45_4 ! 




1 


1 


: 1 


.... 


3153 ' _i 




= 




1 ^ 

1 j*r_ 1 ^ 






! 














228_6 


i 




T- 
1 




50_1 


\ 




1 




112_7 


\ 




j 




442 1 






i 




66.2 


! 




i 




304_2 








- 


44.1 










161_4 










46_S 


306-315 : 








942_1 


i 




i 




5.4 


393-407 ! 


416-426 


456-465 


1 




20.4 


396-405 ! 


410-419 


461-481 






328.2 


1 




1 




S20.2 


1 




1 
1 




771_1 


1 








999.1 










853_1 


! 








287.1 










288_2 ! 








596_2 1 


1 : 




217_5 1 1 






217_6 1 








528.3 










171.11 










63.4 










353 2 






i 




743-.1 








342.4 








69.3 




i 






70.6 


453-471 


505-515 


1 


! 




129.2 


t 296-315 




1 






58 5 ' ; ; 


188_3 ' ! 




236_6 


I 358-377 


410-423 


428-439 


442-457 467-476 


'480-493 


310_8 


; 238-251 


256-275 


281-290 


1296-310 314-333 


3^8-347 


601.1 i ; ! ■ 1 


544.3 ; - ^ 1 - -i - - ' ^ 


662 1 * ! • i 


87.7 I 




i 






120 1 ; ' 
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Table 4 



10 



IS 



20 



2S 



30 



35 



40 



4S 



SO 



SB 



WEST 



ORF Antigenic! Regions (cont) 1 


• Reqionir Region 18 ! Region IS Region 20 1 Region 21 Regi n 22 


168 6 i ! i ._ L . . . 


238 1 • ' I ! 


SI 2 = \ ' _ • 


77fl 3 I ! i 


276 2 i : 




45 4 t i _ _ 


I 


^ifi a : 1 . 1 i - 


154_15 1 






228_3 ' 


i 1 




228_6 : 


t 

t ► 






50^1 1 






112_7 




1 
1 






442_1 




1 






56_2 










304_2 




r 






44_1 




1 






161_4 




) 


! - - 


46_5 




I 


r 


942_1 




1 




5_4 ! 




1 


20 4 I 




! 


328_2 




! 


i 


520 2 






i 


771 1 j 






999„1 ! 




I 


853 1 1 




1 


287_1 1 




i 


288.2 " 






596_2 ; 






217_5 = 






217 6 






528_3 I 


1 [ ; 


171_n ' 


< i 


A . 1 1 1 


353_2 


III! 


743-1 






.^42 4 1 


i 


69^3 : ! < ' 


70 6 ! i ? - ? 


129 2 ; \ ■ 


58 5 i j ' 


188»3 i : . 


236 6 i 


?in fl 357-366 370-379 :429-438 443-452 .478-487 551-560 


601 1 ! 




544 3 : 


fifi? 1 

87 7 ■ 


120 1 • 
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Table 4 



ORF 



168,6 



Region Z3 



Antigenic- Regi ns , - 

Re gion 24 ! Region 25 ! Re gio n 26 : R g ion 27 I Re gion 28 



l(cont) 



238.1 



51,2 



278_3 



276,2 
45,4 

154.15 
228, 3 



228,6 



50,1 



112,^^ !_ 



442,1 



66,2 



304,2 



44,1 



161,4 



46.5 



942,1 



5,^ 



20,4 



328,2 



520,2 



771,1 



999,1 



853_1 



287,1 



288,2 



596,2 



217_5 



217_6 



528,3 



171,n 



63_4 



353^2 
>43,1 



342_4 



69.3 



70.6 



129.2 



58, 5 
188.3 



236,6 



310.8 622-632 

601.1 

544, 3 ■ 

662 ,1 

JT^T 

120,1 



"670-685 1708-718 823-836 858-867 =877-886 
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Table 4 



70 



IS 



20 



25 



30 



35 



40 



45 



SO 



ORF 


Antigenic! Regions ;(cont) 




Region 29 : 


Region 30 ; 




168_6 i ! 


238.1 


i 






51_2 


1 

1 






278_3 ! i 


276_2 1 1 


45.4 i 


! 






316_8 


154.15 • 1 j 


228_3 : 1 


228.6 


1 




50.1 






1 1 2.7 






442_1 ! 






66.2 1 






304.2 






44.1 






161_4 






46.5 i 


1 


942_1 


I 


5_4 


i 


20 4 ! i 1 


328_2 




520.2 . 




771.1 




999.1 I 


! 


853.1 






287_1 ! 




288 2 : I 




596_2 i 




217_5 ■ 1 ! 


217 6 t ! i 


528.3 I \ _ 


171 11 ! i \ 


63_4 1 1 


1 


353.2 . 1 1 




74r.i ' 




342.4 1 1 




69" 3 ' 




70.6 ! 


129_2 ! 


58.5 








1 38.3 


236.6 i 


310.8 


601.1 








544.3 


662.1 


87 7 


120 1 
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Table 4 



5 



10 



ORF 




BLAST 


Antigenic! Regions 










HOMOLOG 


Region 1 


Region 2 


Region 3 


Region 4 


46.1 


5241 


aldehyde dehydrogenase 


8-17 


36-52 


83-96* 


'"112-121 


63_4 


5242 


glycerol ester hydrolase (P. 


9-26 


57-73 


93-107 


123-133 


174_6 




5243 ketopantoatc hydroxymeth 


71-80 


" 20'3-2'l 2 


242-254 


"265-274 


20S_16:5244 


ornithine acetyltransferase ; 


MO 


34-43 54-63 


. j94-2lb,_ 


267^1 


5245 


NaH-antiporter protein (E. Y 


120-129 


332-347 


398-408 




322_1 15246 


acriflavin resistance protein 


5B-75 


153-164 


203-231 


264-284 


4f5_2 


5247 


transport ATP-binding prot< 


108-126 


218-227 J 


298-308 


315-334 


214_3 


5248" 


2-mtropropane dioxygenas€ 


123036 


216-233 


283-292 ^ 


297'-366 


587.3 


5249 


• clumpinq factor 


5-14 


43-54 


59-68 


76-95 


685_1 


5250 


signal peptidase 


59-68 


72-81 


86-95 


99-108 


54_3 


5251 


fibronectin binding protein 1 


23-3*2 1 


37-46 


50-59 _^ 


89-98 


54_4 


'5252 


fibronectin binding protein I 


43-52 


66-75 


95-104 


, 147-156 


S4_S 


5253 


fibronectin binding protein 1 


49-60 


81-90 






54_6 


5254 


fibronectin binding protein 1 


55-71 


82-97 


139-158i 175-186 . 


32B_1 


5255 


lipoprotein (H. flu) 


11-20 


61-70 


I 96-105 





Table 4 

2S 



ORF 


* 


Antigenic; Regions 


(cont) 1 1 




Region 5 1 


Region 6 1 


Region 7 


Region 8 1 


Region 9 1 


Region 10 


46„1 


215-242 i 


333-352 1 


376-385 


41 6-432 


471-487 1 


63_4 


' 145-154 ; 


191-202 I 


212-223 


245-265 


274-283 ' 


291-300 


174 6 : ! 


i 


1 




206_16 


239-259 1 275-284 1 




1 : 




?fi7 1 : t i f 




322.1 


. 298-319 


350-359 




1 






41S„2 


344-353 


371-380 


395-404 


456-465 


486-495 


. 5J_8-_527__ 


214_3 


; 318-337 _ 


365-375 










537.3 


"i 105-115 


142-151 


156-166 


173-182 


186-198 


204-213 


685_1 


: 113-122 


130-145 










54.3 


1 128-138 


185-194 


217-226 


251-250 


258-277 


295-305 


54_4 


i 175-188 


191-2CX) 


203-212 


220-229 






S4_*5 


1 












54_5 


i 220-230 


287-304 


31 7-326 


344-353 


364-373 


378-387 


328_1 ! 













45 



60 



55 



213 
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Table 4 



ORF 




! Antigenic! Regions i(cont) f 








Region 1 1 


• Region 1 2 


Region 13 


Reqi n 14 1 


Region 15 


Region 17 






63_4 


306-315 


: 319-328 


366-376 


395-420 


453-462 


_ 467:47 6 


174_6 i 










206_16 ; 








1 i 


267^1 \ \ \ 




322_1 ! 
415-2 


539-555 












214_3 ! ' 




! i 


587_3 


217-226 


= 278-287 


318-327 


332-342 


351-360 


_i 377-386 


54j 


316-325 


i 329-345 


355-372 


387-396 


416-425 




54_4 ' ' 










54_5 


1 










54_6 


396-407 


i 427-436 


514-531 


541-550 


569-578 


' 161 2-622 


328_1 


: 1 


1 



Table 4 



ORF I 




Antigenic; 


Regions j 


^cont) 


! 






Region 1 8 


Region 1 9 \ 


Region 20 < 


Region 21 i 


Region_22 _ 


Region_23 


4€_1 ' ' 1 










63_4 : 485-500 


513-525 










174_6 ; 










206_16 : 










267 1 i i 










322.1 1 














415_2 














2143 


1 

i 










587-3 


39S-405 


425-442 


459-470 


485-494 


505-514 


53V562 


685_1 


1 










54.3 


455-462 


472-491 


'517-536 








54„4 












54^5 - 














54_6 


639-648 


673-681 


703-715 


723-732 


749-760.. 


772^8X7 


328-1 
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Table 4 



ORF ' Antigenic- Regions (cont) ! 1 ........ 


Region 24 t 

46.1 ! 


Region 25 \ Region 26 Region 27 \ Regipn_28__; .Region J 9_ 


63 4 ■ i i 1 . -i 




174 6 . ' 1 1 i 


Z06_16 : i ! 




267^1 : i i j 




322 1 : f 1 




._ . . 


415 2 


J 


214.3 i 






587_3 567-578 = 584-601 : 607-840 


844-854 


858-870 


^877-886^ 


G85 .1 1 






54_3 1 


i 








54_4 


1 






54_5 : 


1 






54_6 793-802 


811-826 834-848 


866-876 


893-903 


907-918 


328_1 1 : 









Table 4 



ORF 1 Antigenic Regions '(cont) 


! Region 30 ■ Region 31 1 


45_1 ! 




63.4 


174_6 


206„1 6 ! 




267_1 1 ; 




322_1 1 ^ 




415_2 




214_3 1 i 




587_3 1889-911 .927-936 




685^1 1 1 




54_3 ! 1 




54_4 1 1 




54.5 1 1 




54_6 ; 925-944 ^951-997 




328_1 1 1 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Human Genome Sciences, Inc. 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 

(D) STATE: Maryland 

(E) COUNTRY: US 

(F) POSTAL CODE: 20850 

(ii) TITLE OF INVENTION: Staphylococcus aureus Poly- 
nucleotides and Sequences 

(iii) NUMBER OF SEQUENCES: 5255 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4 Mb storage 

(B) COMPUTER: HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/009,861 

(B) FILING DATE: 05 -J7^- 1996 



(2) "INFORMATION FOR SEQ ID N0:1 : 
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TAAAGCTGTT 


GAATAATTTT AGTGCCTAAA 


CCATCAATAT 


TcATGGCTTG 


TCTTGaTACA 


13500 




AAGTGnATCa ATCCtTcAAC AAGTTGTGCT 


TGGTCATTTT 


GG 




13542 


5 


(2) INFORMATION FOR SEQ ID NO: 155: 








10 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1893 base pairs 

(B) TYPE; nucleic acid 
{C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 








IS 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: 








CAGTAAACAC 


CTCTGATTAC GAATATTTAT 


ACATTTATTT 


TAACACATGC 


ACTGATTTAC 


60 




GACTACTAAA 


CACCTTTACG TAAAAAGGGT 


AAACATGGTT 


TATCTATCTT 


GGTTATCTAT 


120 


20 


TTATAAATAT 


TTnTCATATT ACGCATAACA 


ATTGCTTAAA 


ATATGTATAA 


AAATGAATAT 


180 




ATGTGTAATA AACTTGCTAA TTATTAGATT 


TAATAAGCGT 


CAATTGTTTG 


AACATATTtA 


240 




ATTAAAATCA 


CATTGATATC ACAGATACGA 


ATATTGTCGT 


ATAGAAATTG 


AAAATTCTAT 


300 


25 


TTTTTAAATG 


AAAGTCTTCA ACATAATTTT 


AAGTTTCAAC 


ATGAGAAAAA 


TCGATTAACA 


360 




AACAACGTCA 


GTTGAATATG CCTTTTGAGA 


CATTTCAAAC 


TTTACAATTG 


TTGCTAATCG 


420 




ATATATTTGC 


TTTTAGTGAT CCCTGCTATA 


AAAjr^^TCA 


ACGATTTCTA 


ATAAGTGTTT 


480 


30 


TGTATTGAAT 


TGTTCATCAA TTTGCGTTAG 


TTCATCCACT 


GCTGCGTCTC 


TATGATAAGT 


54 0 




CAATTTATCT 


TCTGCGCCAT CTTTCCCTAA 


TAAACTCACG 


TACGTACTTT 


TATTATTTTC 


600 


35 


AAGATCGCTG 


CCCACTTTTT TACCTAACTT 


TGCTTCATCA 


CCATAGCAGT 


CTAATAAATC 


660 


ATCTTTAATC 


TGGAACATCA TACCTAAATG 


ATAACTATAA 


CTTTCTAAAT 


GTTCTTTAGT 


720 




TGT^TCATCG ACATTAGCGA TATCTGCTGC 


ACTCATAACC 


GCAAAAGTTA ATAATGCTCC 


780 


40 


TGTTTTTGTT 


TTGTGTATCA TTTCCAAAGT 


TTCAAGATCA 


ATTGGTTGGC 


CTTCGCTTTG 


840 


CATATCTAAC 


ATTTGACCGC CGACCATTCC AACATGACCA 


CTTGCTATTG 


ACAGCCGTTG 


900 




TAGAACTTTT 


ATTTTTACTT CATCAGTTAA TCTATCATCA 


CTTGAAATAA 


GTTCAAATGC 


960 


45 


TTTAGTTAAT 


AAAGCATCAC CTGCTAATAT 


CGCAGTCCAC 


TCACCATATA 


CTTTATGATT 


1020 




TGTTAATTTT 


CCTCGTCGAT AATCATCATT ATCCATCGCT 


GGTAGGTCAT 


CATGAATAAG 


1080 




TGAATATGTA 


TGAATCATTT CTAGTGCAAT 


TGCGCTCTTC 


ATACCTAACT 


CATACTCGGT 


1140 


SO 


ATTTAGTGAA 


TCTAAAGTGA GTAATAACAG AACTGGTCGG 


ATGCGTTTAC 


CTCCAGCATT 


1200 




TAATGAATAC 


AACATACTTT CTTCTAGCTG 


AGTATCCATT 


ACTGATTTAT 


TTATCGCAAcj 


1260 



55 
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CATCCTCAGC TTCTTCTTTT ATTAAGTCAT TCACCTTTTT TTCGGCATTT TTTAAAGTTG 1380 

TGTCACAAGC TGCTGATAGT TTCATACCAC GTTGATATAA ATCTAATGAT TCCTCTAAAG 1440 

^ ATACTGTTTC ATTATCTAAT TTTTGAACAA TTTGCTCTAA TTCTTGCATC ATTTCTTCAA 1500 

AACTTTGCGT TTCTTTAGTC ATTATTACAC CTTACTTTCG TAACTTTTGC ATCTACTAAG 1560 

CCATCTTTCA TTGTTAACGT CAATTGATCA TTTTCTGTTA AATCTTTAGT ACTCGTAATG 1620 

10 

ACrrCGTCTT TTTTATTAAC AATTGCATAT CCACGCAACA TTGTATTAGT TGGACTTAAA 1660 

TTGTTTAAGT TTTCTACTTT ATTTTTCAAA TCATTTTTAT AACTTAATAT CTTAGAATTC 1740 

AATAATITAA CAAGTTGGTT TGTCAATTGA AGATTATnTT GTTGTTCTTG ATTAACACTA 1800 

IS 

CTTAGTAATG CTTTTAAATn ATAACOTTGG TGCAACAGCA TTAAATCGAG GCCCCGGTGG 1860 

TCCAAAGTTG CCCGAATTnG TGGTTTCAGG CCC 1893 
20 (2) INFORMATION FOR SEQ ID NO: 156: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 821 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
25 (D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156: 

AAAATATATT CCTTCACTTA ATATTCAATT AGAGAAAAAC ATGGTGATTG TAATATGTTG 60 

TGCAATATTT CTGGGTGTTT TAATACTTTT TTTATTTCTG AATCGTAAGC TAAGGTTGGA 120 

AATTTATAAT AATAACTCTA GTAAAGGGAA AATAATTTTA TTTCCTTCAT TAAAAAACTT 180 

35 

TTGTTTCACA ATATTTTATT ATTTTTTATT TGGCGGTCTT TCAATAATGG CTCTAAGTAT 24 0 

GTTATTAACT TTAAATCCTC AAAATATAAT AGGCTTTATT GGTTGGTTGG TAATGACTGC 300 

AGGTTTCTTT CTGTTAAACA TGTCATCGAT TATTGACAAA AAAATTTATG TATTATCTAA 360 

40 

AACTAACACG GTGGAAAAAT GATGGTTTAG CTGGATTTAC TGCAGGTTCT ATTTCGGCAA 420 

TACTTGTATA TTGGACCAAT CAAAAAAATG AATTTGGAAT AAAAGATAAA AACGATTGGA 480 

TAGGACATAA ACTAGACGTT GGTATAGATG CTGTAGAAAA ATCTGCAGAA AAAACAGTAG 54 0 

45 

ATGGTGTTGA AAATGTCATG GTGAAGCTTC AAAAAGTATT TCTAATCATA TAAGCCCTAA 600 

GAAATGGAGC TGGTAAATGT TGCTATGCGA ATCTAAAATC ATCAATAAAA ACCCAAAATA 660 

SO TAGAATTATT AAATATAATG ATGAATACTT AATGGTCGAT ATAATAAGCA CTTGGATTAG 720 

TTTATTTTTT CCTTTTATTA ATTGGTTCAT CCCaAAAGaA TACGTCAAAA TTAGTAGAGA 780 

55 



785 



WEST 



